Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix strides issues over FITS arrays and base arrays with negative strides #930

Merged
merged 2 commits into from
Feb 25, 2021

Conversation

eslavich
Copy link
Contributor

@eslavich eslavich commented Feb 22, 2021

This PR fixes bugs in serializing views over FITS arrays and base arrays with negative strides.

The astropy.io.fits library always writes arrays in C-contiguous order, but this library has been serializing views as though the arrays were to be written in the same order that they are stored in memory. This works fine for arrays that are already C-contiguous but corrupts views over F-contiguous or non-contiguous arrays. The fix here will handle the most common case where the view and the array have the same memory layout. It raises an error when they do not, which is better than writing unreadable files. I ran the jwst regression tests on this branch and all tests passed except for 5 known and unrelated failures.

In the case of a base array with negative strides, we've been serializing views relative to the buffer that the base array is striding over, but writing the data in the memory order of the array itself. The fix here is to write the array in the same order as the underlying buffer which keeps the views valid. I don't have a test for this case because I haven't been able to create a base array with negative strides using numpy alone. I did run the script from #916 with scipy 1.6.1 and the array read back in correctly. If anyone knows how to create such an array do please let me know!

Resolves #916 and spacetelescope/jwst#5699

Copy link
Contributor

@jdavies-st jdavies-st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I didn't follow everything, but the tests are clear. And they pass. And the jwst regtests that use numpy_dtype_to_asdf_datatype() in datamodels pass too. So 👍

Just some minor comments and questions below.

Comment on lines -49 to -54
def override_byteorder(self, byteorder):
# FITS data is always stored in big-endian byte order.
# The data array may not report big-endian, but we want
# the value written to the tree to match the actual
# byte order on disk.
return 'big'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed, and what was this ever supposed to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FITS always writes arrays in big-endian byte order, so we needed a mechanism to tell NDArrayType to ignore the byte order reported by the base array. This logic has been moved inside NDArrayType itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see the logic moved inside the class itself.

Comment on lines 111 to 117
if fd is None or array.nbytes >= OSX_WRITE_LIMIT and array.nbytes % 4096 == 0:
return _array_tofile_chunked(write, array, OSX_WRITE_LIMIT)
return _array_tofile_simple(fd, write, array)
elif sys.platform.startswith('win'): # pragma: no cover
elif sys.platform.startswith('win'):
def _array_tofile(fd, write, array):
WIN_WRITE_LIMIT = 2 ** 30
return _array_tofile_chunked(write, array, WIN_WRITE_LIMIT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSX_WRITE_LIMIT and WIN_WRITE_LIMIT are not really global variables here, so should probably be lower case. Or you could make them global variables. They're identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bit strange. I'll move those outside the functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm going to tackle this in a separate PR. These workarounds seem to no longer be necessary, but this PR will be included in an asdf 2.7.3 and I'd rather drop them in 2.8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -101,8 +101,7 @@ def _array_tofile_chunked(write, array, chunksize): # pragma: no cover
def _array_tofile_simple(fd, write, array):
return write(array.data)


if sys.platform == 'darwin': # pragma: no cover
if sys.platform == 'darwin':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd to have this logic at the module level. Wouldn't it be clearer to have the logic in the calling function? Or are there more than one calling functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know for sure, but I suspect whoever wrote this was trying to be efficient by doing the check once at import time instead of on every call.

if len(array.shape) != 1 or not array.flags.contiguous:
raise ValueError("Requires 1D contiguous array.")

_array_tofile(None, self.write, array)

def seek(self, offset, whence=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time I've seen whence as a kwarg. I like. Though it's not really clear from the docstring what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's funny, I didn't notice that. Seems to be a bit of a tradition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 Shows you how much I paid attention in my C programming!

Comment on lines +414 to +416
# The ndarray-1.0.0 schema does not permit 0 valued strides.
# Perhaps we'll want to allow this someday, to efficiently
# represent an array of all the same value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the ndarray-1.0.0 invalid? What about ndarray-1.1.0? I.e. test_example2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't created ndarray-1.1.0 yet, have we? I don't see it in asdf-standard...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Nevermind.

I guess this would be a candidate feature of 1.1.0. =)

block.data.dtype.itemsize != data.dtype.itemsize or
block.data.ctypes.data != data.ctypes.data or
block.data.strides != data.strides):
raise ValueError("Views over FITS arrays must have identical memory layout")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a user who doesn't know what memory layout means, it might be good to say something like "Quit trying to serialize a numpy view!" in the error message, with a link to the numpy docs about the differences between arrays and views of an array.

https://numpy.org/doc/stable/glossary.html#term-view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that message is a bit cryptic. I'll rework it.

@eslavich eslavich merged commit d2e7a1a into asdf-format:master Feb 25, 2021
@eslavich eslavich deleted the JP-1911-fix-array-views branch February 25, 2021 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scipy 1.6.0 generated ndarray can not be restored
3 participants