-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode/Decode complex fill values #363
Encode/Decode complex fill values #363
Conversation
Implement a simple test to try constructing Zarr Arrays with complex data to see what issues crop up so as to fix them.
Provides a simple test to see if Zarr Arrays can handle storing a variety of complex typed fill values.
Simply places the real and imaginary values into a list with each one being encoded/decoded using the floating type used to represent them independently. This keeps the encoded complex fill values human readable, follows the existing floating value encoding/decoding rules, and represents the data in a way that JSON can easily represent it.
Adding to the v2.3 milestone to make it easier to keep track of this. However if we opt for a different approach or decide to pass on complex numbers for now, we can always remove it from the milestone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great to be able to encode complex data in zarr. I have immediate use cases for this.
I have a few minor suggestions about testing, but I don't consider them to be blockers for merging this.
elif dtype.kind in 'c': | ||
v = (encode_fill_value(v.real, dtype.type().real.dtype), | ||
encode_fill_value(v.imag, dtype.type().imag.dtype)) | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this encoding / decoding of fill values is necessary is necessary because json does not support serialization of complex data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right.
This will encode to a JSON list, which is nice for readability. Though we could certainly imagine other ways of handling this (e.g. base64 encoded string).
Any preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems totally reasonable.
a = np.linspace(0, 1, z.shape[0], dtype=dtype) | ||
a -= 1j * a | ||
z[:] = a | ||
assert_array_almost_equal(a, z[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the special treatment required for fill values, would it make sense to add an explicit check that complex data with missing values is handled correctly? I see that the metadata encoding is checked below, but not the actual filling procedure (as far as I can tell).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate a bit on what kind of representation of missing data we are discussing? We handle NaN
currently, but maybe this is not what you are thinking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is to make sure that missing data (the kind that triggers the fill_value
machinery to be activated) is present in the array. For example, by adding
a[0] = np.nan
However, I might be misunderstanding what fill_value
does in zarr. Is fill_value
only used to represent uninitialized portions of the array? Or are all nans explicitly translated to fill_value
before serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK fill_value
is only used to handle uninitialized chunks. Am not aware of any special behavior around NaN
s in Zarr. Does that help?
Edit: IOW this assert_array_almost_equal
line is testing the uninitialized chunks are constructed with the fill_value
.
Thanks for the feedback @rabernat. Responded inline above. Happy to work with you to address these concerns. |
Any other thoughts on this? Currently thinking it would be good to merge this by end of week. |
Please let me know if anything else occurs to you, @rabernat. Happy to address in a follow-up. |
As this followed PR ( #309 ), it effectively started testing |
Fixes #244
Provides a simple implementation of encoding/decoding complex fill values and tests that it works in a basic case.
TODO:
tox -e docs
)