Open-Earth Monitor Global Workshop 2023

• Damiano Oldoni

Image by Tomislav Hengl (edited by Damiano Oldoni)

General

These notes are a DRAFT. Some talks have still to be added.

The well organised workshop was of great interest, especially in the context of the B-Cubed project.

Open Earth Monitoring, led by Tom Hengl, has succeeded in gathering the most important European initiatives dealing with Earth Observation (EO) data. The EO community is confirmed to be very active, and many initiatives are in the development phase. Some challenges and tentative solutions were discussed:

  • how to deal with big amounf of data
  • how to deal with data providers fragmentization
  • how to deal with the growing amount of analysis workflows
  • the reproducibility and interoperability of EO workflows
  • the FAIRness of EO data
  • how feasible is the upscaling of local workflows for use on global data
  • how to mobilize EO data and other data sources
  • development of standards for analysis-ready data

It was also clear how important the European Space Agency (in collaboration with NASA) is in building long-term solutions to such challenges. An example is the Copernicus Data Space Ecosystem, which facilitates access to both data and tools using open data and open science paradigms.

More details in the Talks session.

My talk about the spatial uncertainty aspects of biodiversity occurrence cubes (see pdf slides) was part of a session about biodiversity monitoring, which still is a niche application within the EO community. The session was held on Friday morning from 11:15 to 12:35 and very interesting project workshops were organised in parallel, with low attendance as result. However, the talk generated some interest and possible future collaborations with EURAC researchers involved in Birdwatch (Horizon project) and eco4Alps (Application project of the EO4ALPS Regional Initiative funded by esa).

I also enjoyed meeting Edzer Pebesma in person, an authority in the geospatial community and the maintainer of the well-known sf R package, among others.

Talks

Reprocudible and Reusable Remote Sensing Workflows

Authors: Edzer Pebesma, Anna Anghelea

Abstract

Reproducible remote sensing papers: Edzer is trying to launch a new open access and free (no publicatoin fee) journal about remote sensing.

What’s easy to do?

  • Package versioning
  • System requirements versioning,
  • Operating system (Virtual Machine or container)

On the other hand, what’s difficult to do?

  • Run workflows on large datasets. Possible solution: share smaller datasets and argue why that is sufficient
  • Using Google Eearth Engine, arcGIS, ERDAS, but also HPC dedicate models: to what extent are such workflows reproducbile? Are we still doing open science then?
  • Get someone to review your code and give feedback:
    • journals have no review requirements, neither for authores nor for reviewers
    • Code check: would you like to run my code, review it and give me feedback? It’s a nice initiative: Independent execution of computations underlying research articles. You can add a badge on GItHub in green if somebody run successfully your code.
    • When submitting a manuscript, report on successult reproduction / code review

a key motivation behind openEO was the ability to compare cloud based platforms with each other (tech validation) What’s hard: how do we describe, store, advertise, share, and find openEO process graphs? The json file you get it’s quite difficult to read/understand. Should we share process graphs or rather Jupyter / R-markdown / Quarto notebooks? These are easier to read and can be a mix of text and code.

ESA initiatives: -launching the Open Science persistent demonstrator co-sponsored with NASA and implemented by OGC to build interoperability across platforms and demonostrate reporducilibity of workflows - promoting the process graph to promote reproducibility - coordination of several communities working on related technologies including geo-datacubes and datacube APIs to facilitate reproducibility - expanding and advancing the Deep Earth System Data Lab Deep ESDL that is based fully on open source technology - new activity called earth-code (general link info, details) starting in November 2023 will have dedicated work stream: e.g. by “how to discover and execute workflows?” “How to maintain a healthy community that practices open Earth System science”, “how to persistently store science data AND code AND documentation from Earth Science projects following FAIR principles?”

Idea of Pebesma: make the reproducibility work a publication on its own.

To reproduce things, containerization and software distribution is absolutely needed: new initiative mentioned from audience: geonix, A cross-platform geospatial software distribution and development environment.

Copernicus Data Space Ecosystem: Immediately Available Data and Associated Services

Author: Jędrzej Bojanowski

Abstract

Unfortunately I could not follow it: I was in another parallel session.

Some key points from the abstract:

  • Copernicus Data Space Ecosystem (CDSE) represents a key milestone in access to Copernicus satellite data.
  • CDSE exposes data of other satellites as well, e.g. Landsat, SMOS and Envisat
  • CDSE will offer access to commercial satellite data

The user does not have to order and wait for the data anymore.

Direct access also allows bulk data processing and streaming, e.g. via OGC services (WMS/WFS)

Another novelty of CDSE are the various interfaces where the data are available:

  • old-fashioned download
  • OData, an ESA-adopted standard, based on RESTful API
  • SpatioTemporal Asset Catalog, STAC and STAC API that become a standard in the EO community and being onboarded to OGC
  • Jupyter Hub, very suitable tool for prototyping, developing, and testing applications open-source, online, interactive web application which gives access to computational environments and resources without burdening the users with installation and maintenance tasks

The vast majority of the described capabilities are available free-of-charge for the individual’s use.

European platforms to support the Green Deal

Authors: Benjamin Schumacher (EODC), Christian Briese

Abstract.

EODC: located in Vienna. Active in the C-GLOPS (Copernicus - global land), C3S2 (Copernicus Climate Change Service), Lot 4 (land hydrology and cryosphere) and Global Flood Monitoring Service (fast analysis and predictions for use of public authorities)

EODC is composed of:

  • cloud
  • storage
  • backup
  • HPC

EO platforms show:

  • fragmentation, redundancy and lacking coordination among each other
  • prevalence of old Virtual Machine

Idea: openEO API, which was part of the openEO Platform, is a combination of several backends creating a federation of data accesses via the SpatioTemporal Asset Catalog, STAC.

The end user consults the federated catalogue STAC and behind the screen STAC will do for you the data request to the right data provider such as cesnet, terra scope, eodc, SINERGISE

KEY MESSAGE: federation of data catalogue.

But the most important is a community that builds a framework for FAIR scientific computing upon data access.

storage nodes -> NPC / Cloud Compute (the compute nodes) -> Jupyter notebooks (users)

C-SCALE project helps to reach a federation of data catalogs. Notice that C-SCALE will be integrated with the European Open Science Cloud (EOSC) so that C-SCALE solutions can be seamlessly integrated in all the other EOSC-supported research and innovation processes and practices.

C-SCALE mission is clear: C-SCALE will enhance EOSC Portal with pan-European federated data and computing infrastructure services for Copernicus.

What’s the green deal data space: a federation of data ecosystems to jointly tackle climate change The European Commision laws concerning the green deal data space:

So, this is the road to follow:

  • Abstracting complexity by federated data catalogue
  • Provide transparency by open science (and FAIR I would add) principles
  • Pixel to contintental scalability by scalable buliding block processes

The most important observation from the Q&A: the federation of data is not really a problem, we will implement it with STAC. The challenge is federating the analysis code: implicitly it means that all partners will support openEO standards and its versions.

DestinE’s Core Service Platform

Author: Inés Sanz Morère

Abstract

Goal: DestinE develops highly accurate digital model of the Earth to assist users to monitor, simulate and predict the effect of adopted strategies and mitigation measures

Objectives: contribute to policy makers, researchers (provide access to data, allow modelling including Machine Learning) and general public (raising awareness)

We are in Phase 1 (2022-2024): progressive entering into operation of routine services. The DESOP Advanced services are planned to be ready in Dec 2023.

System implementation

Users interface the COre Service Platform of ESA wich is connected to the Digital Twin Engine of the European Centre for Medium-Range Weather Forecasts (ECMWF) and the Data Lake of EUMETSAT

Notice that it is one system developed by the 3 entities.

DestinE’s Core Service PLatform (DESP) is the user access point to DestinE system.

DESP is a comprehensive framework for the data analysis and management. It enables the development and exploitation of applications and services targeting DestinE data. Direct access to users is provided.

serco is the main contractor together with other software companies such as aliaspace, CGI ThalesAleniaSpace, deimos OVHcloud

Idea is to get validation for gradual opening in December 2023.

DESP Services aims to provide an open and inclusive access to DESP users.

Upcoming events:

Available datasets? Still to be defined.

Q&A: There are already two open ecosystems: how this extra one will work with the others? There will be collaboration as ESA is involved in the development.

Analysis Ready Data (ARD)

Author: Peter Strobl

Abstract

Once upon a time in EO: data must to become operable (data access, calibration, filtering, …) Interoperability: ability of exchange information and use the exchanged information.

Within the Committee on Earth Observation Satellites (CEOS) there are two groups where interoperability is involved.

CEOS interoperability factors:

  • Semantics: naming stuff
  • Architecture: organisational structure of concepts, process and assets
  • Interfacing: data exchange protocols and applications
  • Quality
  • Policy

CEOS has a key role to play in defining Analysis Ready Data (ARD) for the community. The CEOS-ARD specifications are one of the most important aspects.

OGC has approved to form a new Standards Working Group (SWG) on Analysis Ready Data on May 3, 2023, see presss release ISO has designated the ARD series of standards to be ISO 19176, where 19176-1 is the first part.

First of all, about semantics:

  • semantics is challenging and very important
  • examples: EO is not equal to remote sensing
  • downscaling vs upscaling
  • what’s the meaning of some terms like observations and in-situ?

There is a proposal for improving the situation. The proposal main aspects are:

  • consistency: based on a firm foundation of base well defined terms
  • hierarchy
  • authoritative (all majors stakeholers IEEE, CEOS, ISO TC211) commit to use the same vocabulary
  • sustainability: commitment to support the project for 5-10 years at least
  • openness

About architecture:

There should be a lineary sequence in which one level builds on the previous one:

  • FROM raw data TO level 1: reformatting calibation
  • FROM level 1 TO level 2: correction orthorectification
  • FROM level 2 TO level 3: harmonisation resampling (regridding)
  • FROM level 3 TO level 4: fusion

It’s important to keep track about all these steps otherwise we don’t know anymore what we are speaking about as the output could be quite different from the raw data.

Convention: numbers are used to identify measurant steps, while letters indicate spatio(temporal) geometry steps. So, we have:

level input data
LO raw data
L1A calibration ready data
L1B orthorectification ready data
L2C conflation/coimbination ready data
L3B fusion ready data
L3C/D analysis model ready data
L4CD/D inference ready information

A note about interoperability within architecture.

ARD are meant to build data cubes. OGC data cube community practice says: all layers in a data cube need to share the same grid to allow interoperability between layers.

BUT two data cubes (or ARD datasets) do not need to share the same grid to be considered inteoperable! If so, interoperability is restricted to a “one way” road, i.e. a cube can only be involved during one analysis workflow only. Or we need resmapling: is this still interoperability?

If we assume this hypothesis then we should go for a point cloud approach. Just an idea, yet, not a solution.

We need good practices in spatial discretisation of Earth surface. It should be based on the ellipsoidal Earth model. Notice also that the grid is intrinsically a projection.

The Web Map Tile Service (WMTS) standard base is EPSG 3857. It’s good but not applicable everywhere.

And what about the Military grid reference system (MGRS) standard (base UTM EPSG:326xx)?

Some interesting alternatives:

The hugest problem is how to discretise the earth’s surface? Do we use square? But then you don’t cover it all. Another issue, easier maybe to solve: how to scale? Choose a refinement ratio: divide by 2 or by 3? And define clearly the hierarchy, please.

So, we have two roads in front of us: continuous (point-clouds) or dicrete (gridded data). Which road will we take?

There will not be one ARD as there are many types of users, data and analysis. The way we structure ARD will impact how interoperability will emerge. And the succes of ARD depends on whether its architecutre is future proof.

There is an open discussion going on GitHub. Not links mentioned in slides, maybe we should contact the speaker: Peter.Strobl@ec.europa.eu.

Q&A and my personal conclusion: data cubes are actually created with a workflow in mind, so they are in some ways specific and users typically don’t want to deal with points or not gridded data. So, the choice about which road to take is probably will be taken sooner than expected or is already taken.

Aggregating multiple emissions data for actors at all levels through a common schema in OpenClimate

Author: Joaquin van Peborgh from OpenEarth Foundation.

Abstract.

Acronym: GHG = Greenhouse gas

The goal of OpenEarth foundation is to build open source digital products and infrastructure.

Check openclimate.network to see how the dashboard works. At a certain level you will find where data become not available anymore: that’s the “endpoint” of open data.

in the DIGS Data Explorer: you can see that there is no coordination between national and provincial data as the national emission is lower than the sum of the data provided by provinces. Notice that the DIGS Data Explorer has a Python wrapper to get the data directly in Python.

Open Climate data model is an open source data model to track climate action across non-state actors. It integrates 67 data sources with 360k+ emission records. There is an open API for all this data (check GitHub) and a python client.

Cities and regions lack tehcnical capacity for decarbonize even if thery are some of the most critical climate actors. Data collection and lack of internal capacity are the main pain points. It takes personnel and time to collect the data needed and to complete an inventory. Limited internal capacity: you need trained analysts and knowledge is lost when they leave.

Other pain points:

  • Data are heterogenous and/or messy
  • Lack of streamlined tools

To solve the initiative CityCatalyst started! It provides ready-to-use data, intuitive UI support for GHG inventories. It drastically reduces efforts and costs in data collection for local governments, e.g. via data templates and inventories.

Data are going to be accessible through an open API (work in progress).