-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix strides issues over FITS arrays and base arrays with negative strides #930
Fix strides issues over FITS arrays and base arrays with negative strides #930
Conversation
2270cbe
to
c2498f9
Compare
…s over base arrays with negative strides.
c2498f9
to
ab5f6e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I didn't follow everything, but the tests are clear. And they pass. And the jwst
regtests that use numpy_dtype_to_asdf_datatype()
in datamodels pass too. So 👍
Just some minor comments and questions below.
def override_byteorder(self, byteorder): | ||
# FITS data is always stored in big-endian byte order. | ||
# The data array may not report big-endian, but we want | ||
# the value written to the tree to match the actual | ||
# byte order on disk. | ||
return 'big' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removed, and what was this ever supposed to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FITS always writes arrays in big-endian byte order, so we needed a mechanism to tell NDArrayType to ignore the byte order reported by the base array. This logic has been moved inside NDArrayType itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see the logic moved inside the class itself.
if fd is None or array.nbytes >= OSX_WRITE_LIMIT and array.nbytes % 4096 == 0: | ||
return _array_tofile_chunked(write, array, OSX_WRITE_LIMIT) | ||
return _array_tofile_simple(fd, write, array) | ||
elif sys.platform.startswith('win'): # pragma: no cover | ||
elif sys.platform.startswith('win'): | ||
def _array_tofile(fd, write, array): | ||
WIN_WRITE_LIMIT = 2 ** 30 | ||
return _array_tofile_chunked(write, array, WIN_WRITE_LIMIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSX_WRITE_LIMIT
and WIN_WRITE_LIMIT
are not really global variables here, so should probably be lower case. Or you could make them global variables. They're identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a bit strange. I'll move those outside the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm going to tackle this in a separate PR. These workarounds seem to no longer be necessary, but this PR will be included in an asdf 2.7.3 and I'd rather drop them in 2.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -101,8 +101,7 @@ def _array_tofile_chunked(write, array, chunksize): # pragma: no cover | |||
def _array_tofile_simple(fd, write, array): | |||
return write(array.data) | |||
|
|||
|
|||
if sys.platform == 'darwin': # pragma: no cover | |||
if sys.platform == 'darwin': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd to have this logic at the module level. Wouldn't it be clearer to have the logic in the calling function? Or are there more than one calling functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know for sure, but I suspect whoever wrote this was trying to be efficient by doing the check once at import time instead of on every call.
if len(array.shape) != 1 or not array.flags.contiguous: | ||
raise ValueError("Requires 1D contiguous array.") | ||
|
||
_array_tofile(None, self.write, array) | ||
|
||
def seek(self, offset, whence=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time I've seen whence
as a kwarg. I like. Though it's not really clear from the docstring what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's funny, I didn't notice that. Seems to be a bit of a tradition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 Shows you how much I paid attention in my C programming!
# The ndarray-1.0.0 schema does not permit 0 valued strides. | ||
# Perhaps we'll want to allow this someday, to efficiently | ||
# represent an array of all the same value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the ndarray-1.0.0
invalid? What about ndarray-1.1.0
? I.e. test_example2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't created ndarray-1.1.0
yet, have we? I don't see it in asdf-standard...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! Nevermind.
I guess this would be a candidate feature of 1.1.0. =)
asdf/tags/core/ndarray.py
Outdated
block.data.dtype.itemsize != data.dtype.itemsize or | ||
block.data.ctypes.data != data.ctypes.data or | ||
block.data.strides != data.strides): | ||
raise ValueError("Views over FITS arrays must have identical memory layout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a user who doesn't know what memory layout
means, it might be good to say something like "Quit trying to serialize a numpy view!" in the error message, with a link to the numpy docs about the differences between arrays and views of an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that message is a bit cryptic. I'll rework it.
This PR fixes bugs in serializing views over FITS arrays and base arrays with negative strides.
The astropy.io.fits library always writes arrays in C-contiguous order, but this library has been serializing views as though the arrays were to be written in the same order that they are stored in memory. This works fine for arrays that are already C-contiguous but corrupts views over F-contiguous or non-contiguous arrays. The fix here will handle the most common case where the view and the array have the same memory layout. It raises an error when they do not, which is better than writing unreadable files. I ran the jwst regression tests on this branch and all tests passed except for 5 known and unrelated failures.
In the case of a base array with negative strides, we've been serializing views relative to the buffer that the base array is striding over, but writing the data in the memory order of the array itself. The fix here is to write the array in the same order as the underlying buffer which keeps the views valid. I don't have a test for this case because I haven't been able to create a base array with negative strides using numpy alone. I did run the script from #916 with scipy 1.6.1 and the array read back in correctly. If anyone knows how to create such an array do please let me know!
Resolves #916 and spacetelescope/jwst#5699