Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions data_access/README_era5.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
This notebook briefly demonstrates how to download era5 climate dataas a netcdf file, regrid its data from a common longitude-latitude grid to a healpix grid, saves it as zarr data type and compares the original data to the regridded data.


ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades(https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview). While there are a wide variety of meteorological variables available this notebook will focus on the specific humidity for a few days in early december of 2024. While it is possible to generalize the notebook for more variables, time constraint only allowed me to implement the specific humidity directly. Other variables would sadly break the notebook as it is right now.
To gain access to the data it is necessary to create an account at https://cds.climate.copernicus.eu/how-to-api. It is important to also create a local token file and to accept the terms of use on the website. The process of creating a token and how to quickliy downloading some data is demonstrated on the website.
The downloading process is implemented in way that allows for quick processing of daily data, skipping allready existing files in the download process.

The orignial data is downloaded as netcdf and then locally regridded onto a healpix grid(https://healpix.sourceforge.io/). The physics behind the transformation is quite complicated, but the main draw of healpix is, that it represents all pixel areas in the same size. The regridding algorithm works by tranforming the geographical coordinates into "pixels" with a resolution given by the parameter ndim. In the second step the the original values of the variable need to be remapped to their coresponding pixels creating a one-dimensional array that can visualized on a global map. The pixel coordinates and and the corresponding variable values are then stored as zarr type data.
This part was quite challenging for me and took several days of work to implement, as there only very few examples online of people trying to remap netcdf data or similiar file format data to healpix.

The process of saving from netcd or xarray datasets to zarr is only demontrated briefly as my main focus was on healpix. The main idea is to append the new data to the allready existing zarr files, while chunking mainly the large pixel dimension and
keeping the other dimensions as is. Unfortunaley the zarr chunking is only implemented by overwriting the old data instead of appending, because of the wrong use of "mode-parameters" while saving from xarray to zarr.
43 changes: 43 additions & 0 deletions data_access/README_load_swarm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Author: Maximilian Gregorius

A jupyter notebook explaining briefly how to acceess and download data from ESA's Swarm mission as part of the module Earth System Data Processing.

This readme file serves more as a lab book of my findings during the work with the Swarm data than a more conventional, purely descriptive readme file
and follows mostly the layout given in the homework assignment sheet. But first a brief summary of the Swarm mission and of the data retrival system VirES.


# The Swarm Mission and VirES
Swarm is ESA's first constellation mission to survey Earth's magnetic and electric field and their temporal variations. It launched in 2013 and consits of three satellites, called Alpha(A), Bravo(B) and Charlie(C) and is planned to continue the survey until 2025. The satellites carry a wide range of instruments with their main focus being their magnetometers. The primary objectives of this mission are a better understand of core-mantle interactions, investigationg the electric currents flowing in the iono- and magnetosphere and more. A basic overview of the mission can be found at https://earth.esa.int/eogateway/missions/swarm.

In this notebook we will use the data retrival system VirES to get access to a vast amount of data inluding many of ESA’s Earth Explorer missions, Aeolus and most importantly Swarm. This data is available for many different collections of measurements and auxiliary measurements and associated models, including the magnetic field, electric field, ion temperature, satellite positions and many more. A more experienced user might use this data to calculate and visualize Earth's magnetic field or to do simple space weather forecasting, but in this notebook we will focus mainly on getting access to the Swarm data and how to download some geomagnetic field data. The goal is to get the user familiarized with the VirEs environment and working with Swarm data so they can modify it in the future for their own needs. Some basic understanding of geomagnetism is advised but not necessary and anyone willing to learn can check out the ressources below to deepen their understanding.

Understanding Earth's geomagnetic sources: https://link.springer.com/journal/11214/volumes-and-issues/206-1.


# Evaluation of VirES as a Data Portal
As described above VirES(Virtual environments for Earth observation Scientists) is a data retrival system, that also includes a server system and a graphical web interface which allows easy visualization and manipulation of Swarm products.
It is designed to lower the barrier of entry for Scientists who want to acceess Swarm data. The website(https://notebooks.vires.services/docs/vre-overview#)offers a lot of basic explanations and example code to make it possible to get started in
a matter of minutes.
To identify the magnetic data used in the notebook the Swarm Product Data Handbook https://swarmhandbook.earth.esa.int/catalogue/index, which is directly provided by the ESA, is used. Using the appropriate filters for Swarm and magnetic data it is possible to identify a handful of products that would hold potentially useful data. Searching the VirES support website "Available parameters for Swarm"(https://viresclient.readthedocs.io/en/latest/available_parameters.html) provides a long list of available collections and measurements making it possible to identify "MAG" as the desired collection for use in the homework notebook. While the ressources are plentiful and everything is meticulously inter-linked, it can still be quite difficult to find the data one wants to use, as the load of options can be overwhelmimg. It is very helpful to have a basic idea of what you are looking for and to have some experience in working with geomagnetic data to more quickly understand all the abbreviations used for the measurements. In some cases the naming conventions of the measurements differ sligthly between ESA and what is provided by VirES, which may cause headache, but the differences are usually so small, that it is still easily possible to find the right measurements.


# Accessing Swarm Data
To access the data using VirES an access token is required. On the first request for data the user will be prompted automatically to create an account at https://vires.services/ and to set up a token. On subsequent uses of the notebook this token will
be called upon automatically enabling very fast log ins. Alternativly it is also possible to directly include the user information and token information in a few lines of code which might be useful in some cases. Information on how VirES handles tokens can be found at https://notebooks.vires.services/notebooks/02a__intro-swarm-viresclient and a more general overview of tokens and account creation at VirES can be found at https://viresclient.readthedocs.io/en/latest/access_token.html.


# Downloading Swarm Data
For the download of the data the python package viresclient is used. The user sends a request for the data where they specifiy the collection, the products and the timeframe they need.
Under collection the user choses which ESA mission and which instruments are desired, in this case this would mean the Swarm mission and the geomagnetic data. Each collection has a number of measurements associated with it, and the appropriate collection must be set in order to access the measurements.
Under products the user spcifies these measurements, like magnetic field intensity or vectorial magnetic field strength, that they want to download. Here it is also possible to request model data, auxiliary data and the sampling step. Auxiliiary data is not necessarily data collected by the requested mission, but data that can be useful supportive data. For example one could request the auxiliary "Disturbance storm time Index"(Dst) to flag their geomagnetic data as unreliable at times of great solar activity. There are several different models available for a wide variety of objectives. The model evaluations are calculated at the same sample points as for the requested data products. For most models these evaluation calculations happen server side at the time of the data request and only for the most common requested and computatonal expensive models the server stores and uses a cache of some of the model values. How VirES handels models is described at https://viresclient.readthedocs.io/en/latest/geomagnetic_models.html. In this notebook a simple model of Earth's geomagnetic field, called "International Geomagnetic Reference Field""(IGRF), will be requested to use as a point of comparison for our satellite data. The sampling step can be set up using the ISO 8601 standards with a minimal interval depending on the collection used (usually one second).
The timeframe can be chosen depending on the collection used. The Swarm mission launched in 2013, so earlier times are obviously not available, but the data for some groundbased magnetic observatiories might be available for more than a hundred years.
In this notebook the month of May 2024 will be used as an example.
The request returns a data object called Returned.Data, which is basically a wrapper around a temporary CDF file that can either be written to the disk directly or first be tranformed into a pandas dataframe or a xarray type object.


# Visualize Geomagnetic Data
The notbook provides a simple scatter plot to compare the downloaded satellite data to the model provided by IGRF. This approach only works for smaller datasizes and shall only demonstrate how the data may look when mapped on the globe and demonstrate a few common problems the user might encounter when working with this kind of data. In this example of geomagnetic data these are mainly empty data columns and outliners outside of a reasonable physicsal level, that need to be dealt with before further processing of this data.


# Scaling Data Access
It should be relatively simple to upscale the data requested by the user by changing the parameters of the request command. The user can add or remove aditional collections, measurements, auxiliary measurements and models or change the sampling step and time window and regardless of the requested data size this would result in single CDF file. Requesting data for up to 100 years and lots of measurements could potentially lead to an abnormially large CDF file. The largest file downloaded during the testing of the notebook is around three GB, but how VirEs would react to much larger files is untested. It might be a good idea to split the request into smaller "subrequests" to not overburden the downloader and to avoid having to deal with a single NetCDF or xarray file of several hundred GB. The most obvoius idea would be to split the requests by date, but depending on the project it might be more useful to split by measurement or collection or measurement. It really depends on the user case.
Loading