-
Notifications
You must be signed in to change notification settings - Fork 25
Description
(This is on the list for V1)
HDF5 has the ability to read and write data using scale-offset filters to remove the precision of the data stored and use this in a filter/compression pipeline. From the h5py documentation it is not obvious that a reading library like pyfive has to do anything to read this data - if all that has happened is that the precision has been cut, and then the data compressed, a decompression pipeline doesn't need to do anything new.
However that doesn't sound like a scale-offset filter at all, and if we look a bit further under the hood, e.g. hdf5 docs , we find that the algorithm is a bit more complicated, and that the minimum-bits and the minimum value need to be stored with the compressed data for decompression and post-decompression.
Next steps for doing this would be to get some tests, with some test integer and floating point data (definitely both) to compress. We'd need to see what was in the filter pipeline message, and understand what the H5Z_FILTER_SCALEOFFSET filter looks like. Registered filter details can be found here.
Note: Should we consider the n-bit filter(H5Z_FILTER_NBIT) at the same time? Will we get it for free during the implementation?
Note: In implementation, also consider to what extent we want to support the hdf5plugin (or something similar) and any blosc extras so we can deal with any blosc filtering operation?
Note: We don't think NetCDF4 supports any of this.