Download and convert satellite data for use in ML pipelines
Satellite data is a valuable resource for training machine learning models. Forecasting renewable generation requires knowledge of the weather conditions, and those weather conditions can be inferred and enriched using satellite data.
EUMETSAT provide a range of satellite data products, which are easily available
in NAT
image format. In order to improve its accessibility for training models,
this consumer processes downloaded data into the Zarr
format.
Note
This repo is in early development and so will undergo rapid changes. Breaking changes may occur in the CLI and the API without warning.
Install using the container image:
$ docker pull ghcr.io/openclimatefix/satellite-consumer
$ docker run \
-e SATCONS_COMMAND=consume \
-e SATCONS_SATELLITE=rss \
-e EUMETSAT_CONSUMER_KEY=<your-key> \
-e EUMETSAT_CONSUMER_SECRET=<your-secret> \
-v $(pwd)/work:/work \
ghcr.io/openclimatefix/satellite-consumer
For a description of all the possible configuration options, see Documentation.
The satellite consumer provides a number of commands for different logical processing of raw data. These commands (and their options) can be seen when using the cli entrypoint:
$ satellite-consumer-cli --help
When running the satellite consumer using the environment entrypoint (as in the docker container), the command is chosen via an environment variable. There are also a number of common configuration options that are shared between all commands:
Variable | Default | Description |
---|---|---|
SATCONS_COMMAND |
The command to run (consume/archive/merge). | |
SATCONS_SATELLITE |
The satellite to consume data from. | |
SATCONS_WORKDIR |
/mnt/disks/sat |
The working directory. In the container, this is set to /work for easy mounting. |
SATCONS_HRV |
false |
Whether to download the HRV channel. |
EUMETSAT_CONSUMER_KEY |
The EUMETSAT consumer key. | |
EUMETSAT_CONSUMER_SECRET |
The EUMETSAT consumer secret. |
Each command then has its own set of configuration options:
Consume:
Downloads a single scan for a given time into it's own store in the working directory.
Variable | Default | Description |
---|---|---|
SATCONS_TIME |
The time to consume data for (when using the consume command). Leave unset to download latest available. |
|
SATCONS_VALIDATE |
false |
Whether to validate the downloaded data. |
SATCONS_RESCALE |
false |
Whether to rescale the downloaded data to the unit interval. |
Archive:
Downloads all scans for a given month into a single store in the working directory.
Variable | Default | Description |
---|---|---|
SATCONS_MONTH |
The month to consume data for (when using the archive command). |
|
SATCONS_VALIDATE |
false |
Whether to validate the downloaded data. |
SATCONS_RESCALE |
false |
Whether to rescale the downloaded data to the unit interval. |
SATCONS_NUM_WORKERS |
1 |
The number of workers to use for processing. |
Merge:
Merges consumed stores for a given time window into a single store in the working directory.
Variable | Default | Description |
---|---|---|
SATCONS_SATELLITE |
The satellite to consume data from. | |
SATCONS_WINDOW_MINS |
210 |
The time window to merge data for. |
SATCONS_CONSUME_MISSING |
false |
Whether to consume missing data. |
Current;y the consumer is built to the specific data requirements of Open Climate Fix.
However, adding a new satellite in the from EUMETSAT shouldn't be too hard, provided it uses
the same seviri_l1b_native
format and sensor channels - just update the available satellites
in config.py
.
The python package contains a CLI entrypoint for ease of use when developing, which is available
to your shell via the sat-consumer-cli
command, assuming you have built the project in a virtual
environment, and activated it.
This project uses MyPy for static type checking and Ruff for linting. Installing the development dependencies makes them available in your virtual environment.
Use them via:
$ python -m mypy .
$ python -m ruff check .
Be sure to do this periodically while developing to catch any errors early and prevent headaches with the CI pipeline. It may seem like a hassle at first, but it prevents accidental creation of a whole suite of bugs.
There are some additional dependencies to be installed for running the tests,
be sure to pass --extra=dev
to the pip install -e .
command when creating your virtualenv.
(Or use uv and let it do it for you!)
Run the unit tests with:
$ python -m unittest discover -s src/nwp_consumer -p "test_*.py"
On the directory structure:
- The official PyPA discussion on "source" and "flat" layouts.
- PR's are welcome! See the Organisation Profile for details on contributing
- Find out about our other projects in the here
- Check out the OCF blog for updates
- Follow OCF on LinkedIn
Part of the Open Climate Fix community.