Fix strides issues over FITS arrays and base arrays with negative strides #930

eslavich · 2021-02-22T21:34:28Z

This PR fixes bugs in serializing views over FITS arrays and base arrays with negative strides.

The astropy.io.fits library always writes arrays in C-contiguous order, but this library has been serializing views as though the arrays were to be written in the same order that they are stored in memory. This works fine for arrays that are already C-contiguous but corrupts views over F-contiguous or non-contiguous arrays. The fix here will handle the most common case where the view and the array have the same memory layout. It raises an error when they do not, which is better than writing unreadable files. I ran the jwst regression tests on this branch and all tests passed except for 5 known and unrelated failures.

In the case of a base array with negative strides, we've been serializing views relative to the buffer that the base array is striding over, but writing the data in the memory order of the array itself. The fix here is to write the array in the same order as the underlying buffer which keeps the views valid. I don't have a test for this case because I haven't been able to create a base array with negative strides using numpy alone. I did run the script from #916 with scipy 1.6.1 and the array read back in correctly. If anyone knows how to create such an array do please let me know!

Resolves #916 and spacetelescope/jwst#5699

…s over base arrays with negative strides.

jdavies-st

Looks good. I didn't follow everything, but the tests are clear. And they pass. And the jwst regtests that use numpy_dtype_to_asdf_datatype() in datamodels pass too. So 👍

Just some minor comments and questions below.

jdavies-st · 2021-02-23T14:18:35Z

asdf/fits_embed.py

-    def override_byteorder(self, byteorder):
-        # FITS data is always stored in big-endian byte order.
-        # The data array may not report big-endian, but we want
-        # the value written to the tree to match the actual
-        # byte order on disk.
-        return 'big'


Why removed, and what was this ever supposed to do?

FITS always writes arrays in big-endian byte order, so we needed a mechanism to tell NDArrayType to ignore the byte order reported by the base array. This logic has been moved inside NDArrayType itself.

Glad to see the logic moved inside the class itself.

jdavies-st · 2021-02-24T03:37:49Z

asdf/generic_io.py

        if fd is None or array.nbytes >= OSX_WRITE_LIMIT and array.nbytes % 4096 == 0:
            return _array_tofile_chunked(write, array, OSX_WRITE_LIMIT)
        return _array_tofile_simple(fd, write, array)
-elif sys.platform.startswith('win'):  # pragma: no cover
+elif sys.platform.startswith('win'):
    def _array_tofile(fd, write, array):
        WIN_WRITE_LIMIT = 2 ** 30
        return _array_tofile_chunked(write, array, WIN_WRITE_LIMIT)


OSX_WRITE_LIMIT and WIN_WRITE_LIMIT are not really global variables here, so should probably be lower case. Or you could make them global variables. They're identical.

That is a bit strange. I'll move those outside the functions.

Actually, I'm going to tackle this in a separate PR. These workarounds seem to no longer be necessary, but this PR will be included in an asdf 2.7.3 and I'd rather drop them in 2.8.

jdavies-st · 2021-02-24T03:40:15Z

asdf/generic_io.py

@@ -101,8 +101,7 @@ def _array_tofile_chunked(write, array, chunksize):  # pragma: no cover
 def _array_tofile_simple(fd, write, array):
    return write(array.data)

-
-if sys.platform == 'darwin':  # pragma: no cover
+if sys.platform == 'darwin':


Odd to have this logic at the module level. Wouldn't it be clearer to have the logic in the calling function? Or are there more than one calling functions?

I don't know for sure, but I suspect whoever wrote this was trying to be efficient by doing the check once at import time instead of on every call.

jdavies-st · 2021-02-24T03:43:19Z

asdf/generic_io.py

+        if len(array.shape) != 1 or not array.flags.contiguous:
+            raise ValueError("Requires 1D contiguous array.")
+
+        _array_tofile(None, self.write, array)

    def seek(self, offset, whence=0):


This is the first time I've seen whence as a kwarg. I like. Though it's not really clear from the docstring what it does.

That's funny, I didn't notice that. Seems to be a bit of a tradition.

😅 Shows you how much I paid attention in my C programming!

jdavies-st · 2021-02-24T03:46:30Z

asdf/tags/core/ndarray.py

+        # The ndarray-1.0.0 schema does not permit 0 valued strides.
+        # Perhaps we'll want to allow this someday, to efficiently
+        # represent an array of all the same value.


Isn't the ndarray-1.0.0 invalid? What about ndarray-1.1.0? I.e. test_example2.

We haven't created ndarray-1.1.0 yet, have we? I don't see it in asdf-standard...

Doh! Nevermind.

I guess this would be a candidate feature of 1.1.0. =)

jdavies-st · 2021-02-24T03:54:10Z

asdf/tags/core/ndarray.py

+                block.data.dtype.itemsize != data.dtype.itemsize or
+                block.data.ctypes.data != data.ctypes.data or
+                block.data.strides != data.strides):
+                raise ValueError("Views over FITS arrays must have identical memory layout")


For a user who doesn't know what memory layout means, it might be good to say something like "Quit trying to serialize a numpy view!" in the error message, with a link to the numpy docs about the differences between arrays and views of an array.

https://numpy.org/doc/stable/glossary.html#term-view

Good point, that message is a bit cryptic. I'll rework it.

eslavich force-pushed the JP-1911-fix-array-views branch from 2270cbe to c2498f9 Compare February 22, 2021 21:55

Fix bugs resulting in invalid strides values for FITS arrays and view…

ab5f6e9

…s over base arrays with negative strides.

eslavich force-pushed the JP-1911-fix-array-views branch from c2498f9 to ab5f6e9 Compare February 23, 2021 02:01

eslavich requested review from jdavies-st and perrygreenfield February 23, 2021 04:21

eslavich added this to the 2.7.3 milestone Feb 23, 2021

jdavies-st approved these changes Feb 24, 2021

View reviewed changes

nden requested a review from kmacdonald-stsci February 24, 2021 13:20

Improve error message

4eccbff

kmacdonald-stsci approved these changes Feb 25, 2021

View reviewed changes

eslavich merged commit d2e7a1a into asdf-format:master Feb 25, 2021

eslavich deleted the JP-1911-fix-array-views branch February 25, 2021 15:18

braingram mentioned this pull request Nov 16, 2022

Unnecessary duplicate arrays when using asdf embedded in FITS files #1232

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix strides issues over FITS arrays and base arrays with negative strides #930

Fix strides issues over FITS arrays and base arrays with negative strides #930

eslavich commented Feb 22, 2021 •

edited

Loading

jdavies-st left a comment

jdavies-st Feb 23, 2021

eslavich Feb 24, 2021

jdavies-st Feb 24, 2021

jdavies-st Feb 24, 2021

eslavich Feb 24, 2021

eslavich Feb 24, 2021

eslavich Feb 24, 2021

jdavies-st Feb 24, 2021

eslavich Feb 24, 2021

jdavies-st Feb 24, 2021

eslavich Feb 24, 2021

jdavies-st Feb 24, 2021

jdavies-st Feb 24, 2021

eslavich Feb 24, 2021

jdavies-st Feb 24, 2021

jdavies-st Feb 24, 2021

eslavich Feb 24, 2021

Fix strides issues over FITS arrays and base arrays with negative strides #930

Fix strides issues over FITS arrays and base arrays with negative strides #930

Conversation

eslavich commented Feb 22, 2021 • edited Loading

jdavies-st left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eslavich commented Feb 22, 2021 •

edited

Loading