Skip to content

[FEATURE] Optimize ERA-5 Met data fetching #72

@rohansaw

Description

@rohansaw

Feature description

Currently, we send multiple requests to GEE to fetch ERA5 metdata: For each region a number of requests are sent. Each chunk encompasses 5 years to avoid GEE memory limitations using getInfo().

Due to limitation in the number of parallel processes in GEE this leads to longer runtimes of VeRCYe, as we are only able to process fetching met data for 5 regions at once.

Suggested solution

This could possibly be overcome, by sending a single request containing the centroids of all regions and the complete daterange to GEE. We should benchmark this first, before implementing in the pipeline. Then GEE can internally parallelize and the results can be exported as a table. Using an approach similar to here, we can find this file in gdrive and download it, while staying with the pipeline.

An alternative would be to download the global daily ERA5 tiffs (as with CHIRPS data) and extract mean/centroid values locally from those. This would also reduce our reliance on GEE.

Additional context

Additionally we should implement mean aggregation instead of only fetching the centroid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions