-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Feature description
Currently, we send multiple requests to GEE to fetch ERA5 metdata: For each region a number of requests are sent. Each chunk encompasses 5 years to avoid GEE memory limitations using getInfo().
Due to limitation in the number of parallel processes in GEE this leads to longer runtimes of VeRCYe, as we are only able to process fetching met data for 5 regions at once.
Suggested solution
This could possibly be overcome, by sending a single request containing the centroids of all regions and the complete daterange to GEE. We should benchmark this first, before implementing in the pipeline. Then GEE can internally parallelize and the results can be exported as a table. Using an approach similar to here, we can find this file in gdrive and download it, while staying with the pipeline.
An alternative would be to download the global daily ERA5 tiffs (as with CHIRPS data) and extract mean/centroid values locally from those. This would also reduce our reliance on GEE.
Additional context
Additionally we should implement mean aggregation instead of only fetching the centroid.