-
Notifications
You must be signed in to change notification settings - Fork 98
Zstd: Don't persist the checksum param if false #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #681 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 62 62
Lines 2752 2756 +4
=======================================
+ Hits 2750 2754 +4
Misses 2 2
🚀 New features to boost your workflow:
|
@dstansby Do you know why the macos tests are failing? I don't have that issue on my local mac. |
It's the same failure at #679, so not related to either PR as far as I can see. |
re. this PR, I'm 👎 to changing the way this parameter is serialised, because it makes the serialisation logic more complex and potentially unituitive. I think the right fix to zarr-developers/zarr-python#2647 is special casing how zarr chooses to seralise the zstd configuration when reading/writing zarr v2 data. |
I don't understand how special-casing in the client library (zarr-python) is any less complex than this small fix? I don't know what numcodecs' policy for backwards compatibility is, but conditionally adding parameters seems like a good way of dealing with that. |
Yes, complexity of the fix would be the same, but |
Is it really? |
2900 repostories and 136 packages according to GitHub: https://github.com/zarr-developers/numcodecs/network/dependents |
I suppose most of them have a transitive dependency through zarr. Anyways. I don't care too strongly about this issue. |
In that case shall we close this, and I can add a note to zarr-developers/zarr-python#2647 that a fix/change to the codec serialisation should go in |
Closing in favor of zarr-developers/zarr-python#2655 |
Not having this change in numcodecs makes things more complicated for other Zarr v2 implementations. Ref: JuliaIO/Zarr.jl#193. IMO, not being able to use numcodecs as a source of truth for this makes compatibility testing trickier than it has to be. Also, since zarr v2 data with |
the current behavior of numcodecs should not be a source of truth for the broader zarr community. that might have been sustainable several years ago when there were fewer zarr implementations, but today is different and we need to approach this problem outside numcodecs. to take this particular example, I don't think it's reasonable to expect the implementers of zarr v2 in different programming languages to watch for breaking changes lurking in numcodecs releases. instead of relying on the behavior of numcodecs, we should use specification documents, even (or especially) written retroactively. We have a repo for this: https://github.com/zarr-developers/zarr-extensions/ it doesn't have any zarr v2 stuff there, but IMO it should. |
Yes, having more explicit documentation is great. Also, I'm confused why adding a new optional config property can be considered a breaking change. Zarr.jl will ignore any compressor properties it doesn't know about, so it would not be broken by this kind of change. |
Numcodecs does not define a data model that distinguishes optional from non-optional keys in JSON documents. When your software consumes a Suppose numcodecs adds a new, required field to the blosc JSON metadata. Your software will silently ignore this field, and this will very likely result in an obscure error or undefined behavior when a user tries to read or write data to that array. Thinking beyond this particular issue, it's worth remembering that the "blosc codec" is little more than 2 separate functions -- an encoding function, and a decoding function. We have chosen to glue these two functions together into a single object, which we call a "codec", but in reality the read and write functions might take different sets of parameters, and this could be reflected in the metadata explicitly, via something like this: {
"name": "blosc",
"configuration": {"encode": {...}, "decode": {...}}
} If your application was accessing data in a read-only mode, then it could safely ignore entirely the contents of |
Changes the zstd codec to not persist the
checksum
param, if it is set toFalse
(the default). This is aimed at improving backwards compatibility.See zarr-developers/zarr-python#2647 (comment)