Skip to content

Caching pyaro datasets: Bug or feature? #1532

@jgriesfeller

Description

@jgriesfeller

Problem:
Right now pyaerocom caches also obs data read via pyaro. While this is not a problem per se, the fact that we always provide filters for the pyaro reading makes this problematic, because besides the variable filter, none is checked for cache invalidation.
Therefore, if one ever reuses the obs-id used in one pyaro reading call without exactly the same filters, the user will always get data from the very first usage of that obs-id (even months later).
Even worse: because the pyaro interface reads the data early (at the first check of PROVIDES_VARIABLES) it occurs to the user that updated data has been read (the reading is in the log file). Pyaerocom's caching mechanism assumes that the data is only read at the reader.read() call, so the caching kicks in only there and returns the cached dataset discarding what has been read by pyaro.

Solution

  • Either disable pyaerocom caching for pyaro datasets and potentially implement a caching there
  • Document the current behavior and call it a feature

minimal solution

  • find a way to provide PROVIDES_VARIABLES without reading the entire dataset in the pyaro interface. pyaro might rename the variables!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions