-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use h5netcdf to read and write netcdf data #786
Comments
I'm glad we're discussing this, but sadly I have the opposite opinion and think that netCDF4-python should be the default. Here are some reasons for my preference:
Some opinions:
Otherwise, could you enumerate the issues you had with NetCDF4/HDF5 C libraries both being used and whether or not you've experienced them any time recently? |
For reference, here is the bug report for the netcdf/hdf5 interaction problem we had a few years ago: According to this conversation, the issue can't be solved before hdf5 1.10.x. We have now 1.8.12 in operations, however I can't reproduce the issue. @sfinkens had another concern with getting the two C libs to install also, right ? |
@mraspaud You are getting HDF5 in your current operations from the system libraries right? In the future you will be using conda right (not that this makes this a non-issue, just double checking)? |
Yes, it's system packages we have. In operations, we will use conda for satpy mostly, but I can't exclude having to use system packages. |
A +1 for h5netcdf, in xarray it says that only scipy/h5netcdf backends can read byte streams and file like objects:
This is in |
From Ryan May of Unidata (not tagging so that he doesn't get a ton of notifications) when asked on the pangeo-data gitter channel:
|
Looks like (some) cloud storage is available with xarray and h5netcdf: https://gist.github.com/rsignell-usgs/cc2d2d4fe1930bd949119e543b56bce1 |
I have a slight tendency towards h5netcdf, because my experience is that linking the netCDF C library correctly against a compatible hdf5 C library is the main difficulty. And that does not only appy to expert users who compile the libraries themselves. Some time ago I had the problem that the netCDF4 and h5py wheels installed by pip were built with incompatible versions of the C libraries: Unidata/netcdf4-python#694. Issues like that could certainly be avoided if we only depended on hdf5. Maybe we should ask shoyer why netCDF4 is the default engine in xarray?
@djhoese netCDF4-1.2.8+ supports file-like objects, too. Maybe that hasn't been implemented in xarray, yet. |
To me this sounds like something that should be reported and coordinated between the two projects. From everything I'm gathering it sounds like h5netcdf is useful in the few key cases:
As for the API, in what cases are we using h5netcdf's new API directly instead of using xarray and using xarray isn't an option? |
The CF writer tests mostly use h5netcdf to read the generated files. But I guess that can be replaced with xarray. Unless there was a particular reason not to use xarray? |
The test environments have netcdf4-python installed so they could use that instead if needed. Reading the NetCDF for verification with xarray may not be a good idea with the way that xarray handles coordinates (it ignores the coordinates attribute when determining per-variable coordinates). |
I always try to use the legacy api of h5netcdf just to able to switch to netCDF4 in case of trouble, so I actually haven't really looked at the new features unfortunately. |
Feature Request
Is your feature request related to a problem? Please describe.
At the moment, satpy uses two engines for handling io on netcdf files: netCDF4, which is a python interface to the netcdf4 C library, and h5netcdf, which uses h5py to read and write nc files. While both engines seem to be working, it is unnecessary to use both, and a harmonisation within satpy would be nice.
Describe the solution you'd like
Using only one engine for nc I/O would be best. The netCDF4 is the official library from unidata. However, it uses a C library in the background that is known for not interacting well with the C hdf5 library. h5netcdf uses h5py, which in turn uses the hdf5 C library, hence removing the need for the C netcdf library. h5netcdf has been reported to be faster in some cases, but might not be fully mature.
My opinion is that limiting the amount of C libraries is a good thing, and relying on only one C library for reading both netcdf and hdf5 is to be preferred. The h5netcdf project seems to be active and responsive, so any problems we might encounter with reading data with it should be fixed rapidly.
Describe any changes to existing user workflow
Hopefully, having only one interface to the netcdf format will simplify the installation of satpy, and should be totally transparent to the user.
Additional context
The h5netcdf project: https://github.com/shoyer/h5netcdf
The netCDF4 project: http://unidata.github.io/netcdf4-python/netCDF4/index.html
The text was updated successfully, but these errors were encountered: