Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for saving the index produced with a full GRIB scan at open. #20

Closed
3 tasks done
alexamici opened this issue Oct 8, 2018 · 3 comments
Closed
3 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@alexamici
Copy link
Contributor

alexamici commented Oct 8, 2018

At the moment every time a GRIB file is opened cfgrib needs to scan all the messages in the file to build the index that is then used to compute the values of the coordinates and build the hypercube representation of the variables.

Worse, when opening a GRIB file with the convenience function open_datasets the index is discarded every time the recursive call fails and the expensive file scan is done again.

Proposed implementation requirements for the feature are:

  • save the index to disk with path + .idx immediately after computation
    • a pickle of the in-memory structure is the simplest implementation
    • shall not fail if the index cannot be written (file can be on a read only filesystem)
  • when opening a file search for the path + .idx index file, test that it is in sync with the GRIB file and load it
    • timestamp ordering is enough for now
    • do not fail if the index is corrupt
  • use locking to avoid concurrent writes or reads and write
    • concurrents reads must be ok
@alexamici alexamici added the enhancement New feature or request label Oct 8, 2018
@alexamici alexamici self-assigned this Oct 8, 2018
@iainrussell
Copy link
Member

I think it's worth mentioning that it probably should not fail if it cannot write the index file - the GRIB file could be in a read-only directory, in which case it would not be possible to write the index to the same place.

@alexamici
Copy link
Contributor Author

alexamici commented Oct 22, 2018

The implementation is still wasteful and very fragile, but the index is saved to and read from disk.

alexamici added a commit that referenced this issue Oct 28, 2018
@alexamici alexamici changed the title Add support for saving the index produced with a full GIRB scan at open. Add support for saving the index produced with a full GRIB scan at open. Nov 12, 2018
@ejhyer
Copy link

ejhyer commented Nov 26, 2021

I just want to bump @iainrussell point above-- at present, it does not fail when reading from a read-only directory, but it produces a message spew that certainly resembles an I/O error. Versions:
cfgrib 0.9.9.1 pyhd8ed1ab_1 conda-forge
eccodes 2.23.0 h7621a5c_0 conda-forge
python-eccodes 2021.03.0 py39hce5d2b2_2 conda-forge
xarray 0.18.2 pyhd8ed1ab_0 conda-forge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants