-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying how fill_value
is handled (if unspecified)
#133
Comments
What about defining the default fill value for a datatype to be the instance of that type with a binary representation of all 0s? Are there any types where this would be counter-intuitive or bad somehow? |
What does that mean for |
I don't know, I never work with those types :) But this "all 0s in its binary representation" idea doesn't seem to make any sense for variable-length types, so it's probably a non-starter. Another option would be to just require a |
I agree --- fill_value should be required. There are a few meanings you could potentially assign to an unspecified fill_value:
Since 1 is the only reasonable option I don't see any advantage in allowing an unspecified fill value. |
@WardF how does NetCDF handle this? |
NetCDF has a default fill value for each of the types, at least for the classic (netCDF3) format. See the end of the grammar in the file format spec. It's unclear what the defaults are for the rest of the types added for the netCDF4 format, so I'll leave that to @WardF . |
Related is how we handle encoding of some more atypical fill values. We discussed this briefly during the community meeting. Tried to summarize below (though please feel free to correct me). Currently the fill value has a default value concept, which applies to both during construction (it can be
For 1, this can either be required or we add some kind of lookup table for determining this. Sounds like NetCDF has the latter. That all being said, as this is a question of the API and not the storage format itself. So perhaps this can be left up to the implementations at present. For 2, it sounds like other storage implementations (like NetCDF) don't do this and instead always store the fill value. Perhaps a good first step to resolving this issue would be to do the same in v3. Separately there was some discussion around how to handle encoding fill values of more unusual types. Issue ( zarr-developers/zarr-python#216 ) came up. In particular @d-v-b brought up |
Trying to address in PR ( #145 ) |
Also worth nothing that https://github.com/fsspec/kerchunk just uses a
Not sure where this convention comes, but it's a more simple alternative to a full data-uri ( |
Just a heads up that there may be some interesting dangling conversations here. |
The Zarr v2 spec leaves undefined
fill_value
s as ambiguous:However if a user writes to a portion of a chunk with different implementations, each implementation will now potentially have different chunks depending on how they handle the
fill_value
.This can also cause confusion in other contexts ( for example zarr-developers/zarr-python#966 ).
Also this gets mentioned in James' overview in issue ( #53 ).
Of course there are some advantages of leaving this unspecified. Namely one can use uninitialized memory to allocate each chunk for writing into, which would be a bit faster. Also this can more easily handle new or complicated types where an appropriate default
fill_value
is not entirely obvious.That said, given some of the issues above, wonder if we should take a different tack and specify
fill_value
for types in v3. Thoughts? 🙂The text was updated successfully, but these errors were encountered: