Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scipy 1.6.0 generated ndarray can not be restored #916

Closed
vhirtham opened this issue Jan 21, 2021 · 7 comments · Fixed by #930 or #949
Closed

Scipy 1.6.0 generated ndarray can not be restored #916

vhirtham opened this issue Jan 21, 2021 · 7 comments · Fixed by #930 or #949
Labels
Milestone

Comments

@vhirtham
Copy link

Take a look at the following script:

import numpy as np
from asdf import AsdfFile
from scipy.spatial.transform import Rotation

original_array = np.array([0.2, 1.3, 3.14])
modified_array = Rotation.from_euler(
    seq="xyz", angles=original_array, degrees=False
).as_euler("xyz", degrees=False)

# putting this one into the tree works
copy_of_modified_array = np.array(modified_array)

tree_write = {"array": modified_array}
file_write = AsdfFile(tree=tree_write)
file_write.write_to("test.yaml")

file_read = AsdfFile.open("test.yaml")
tree_read = file_read.tree

np.array(tree_read["array"])

The last line causes the error:

Traceback (most recent call last):
  File ".../bug.py", line 19, in <module>
    np.array(tree_read["array"])
  File "...\lib\site-packages\asdf\tags\core\ndarray.py", line 291, in __array__
    return self._make_array()
  File "...\lib\site-packages\asdf\tags\core\ndarray.py", line 269, in _make_array
    self._offset, self._strides, self._order)
ValueError: strides is incompatible with shape of requested array and size of buffer

This happens with the scipy version 1.6.0. Previous releases do not trigger the error.

I can create a new array from the scipy generated one without problems and putting the copy into the tree removes the error. So modified_array must have some special property that lets the asdf deserialization fail, but I couldn't figure out what exactly is causing the error. I initially thought the negative stride (see file content below) is the problem, but that was already the case in previous scipy versions.

I am not sure if this is a scipy problem or if there is an error in the deserialization routine of asdf. A related scipy issue might be this one.

Here is the file content produced by the script:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 2.7.2}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.7.2}
array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [3]
  strides: [-8]
...
�BLK 0               �       �       ����]����i�+��8 ��������?�������?���Q��	@#ASDF BLOCK INDEX
%YAML 1.1
---
- 538
...
@jdavies-st jdavies-st added the bug label Jan 21, 2021
@jdavies-st
Copy link
Contributor

I can reproduce this in asdf 2.7.2, but see no error on master.

@eslavich
Copy link
Contributor

eslavich commented Jan 21, 2021

That's a good clue, it seems to succeed in master due to the change in auto_inline default. When I change this line:

file_write.write_to("test.yaml")

to this:

file_write.write_to("test.yaml", auto_inline=2)

the error comes back in master.

@CagtayFabry
Copy link
Contributor

for what it's worth I played around with the arrays/memmap a little, this somehow works:

np.frombuffer(tree_read["array"].block.data.tobytes())
>>> array([0.2 , 1.3 , 3.14])
np.array(np.frombuffer(tree_read["array"].block.data.tobytes()))
>>> array([0.2 , 1.3 , 3.14])

I also stumbled upon this (without knowing what to make of it)

[s for s in original_array.tobytes()] == tree_read["array"].block.data
>>> array([False,  True,  True,  True,  True,  True,  True,  True, False,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True])

@eslavich
Copy link
Contributor

eslavich commented Jan 21, 2021

With scipy 1.5.4, the ASDF array metadata ends up like this:

array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [3]
  offset: 16
  strides: [-8]

with the array's block laid out like this:

In [71]: block = b'\x1f\x85\xebQ\xb8\x1e\t@\xce\xcc\xcc\xcc\xcc\xcc\xf4?\x9d\x99\x99\x99\x99\x99\xc9?'

In [72]: struct.unpack('3d', block)
Out[72]: (3.14, 1.3000000000000003, 0.2000000000000001)

This is reasonable. The array is a view over the block that strides backwards, reversing the order of the elements. It starts at a 16-byte offset, where the value 0.2 resides.

This reflects the relationship between modified_array and its base array:

In [39]: modified_array
Out[39]: array([0.2 , 1.3 , 3.14])

In [40]: modified_array.strides
Out[40]: (-8,)

In [41]: modified_array.ctypes.data
Out[41]: 140644258473712

In [42]: modified_array.data.tobytes()
Out[42]: b'\x9d\x99\x99\x99\x99\x99\xc9?\xce\xcc\xcc\xcc\xcc\xcc\xf4?\x1f\x85\xebQ\xb8\x1e\t@'

In [43]: modified_array.base
Out[43]: array([[3.14, 1.3 , 0.2 ]])

In [44]: modified_array.base.strides
Out[44]: (24, 8)

In [45]: modified_array.base.ctypes.data
Out[45]: 140644258473696

In [46]: modified_array.base.data.tobytes()
Out[46]: b'\x1f\x85\xebQ\xb8\x1e\t@\xce\xcc\xcc\xcc\xcc\xcc\xf4?\x9d\x99\x99\x99\x99\x99\xc9?'

Here the 16-byte offset is seen when comparing ctypes.data.

With scipy 1.6.0, we get ASDF metadata like this:

array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [3]
  strides: [-8]

That already doesn't make any sense, since you can't stride backwards starting at offset 0, there's nowhere to go. The block is laid out like this:

In [156]: block = b'\x9d\x99\x99\x99\x99\x99\xc9?\xce\xcc\xcc\xcc\xcc\xcc\xf4?\x1f\x85\xebQ\xb8\x1e\t@'

In [157]: struct.unpack('3d', block)
Out[157]: (0.2000000000000001, 1.3000000000000003, 3.14)

The elements are already in the correct order, so we don't want to stride backwards anyway. This is what the modified_array and its base looks like:

In [141]: modified_array
Out[141]: array([0.2 , 1.3 , 3.14])

In [142]: modified_array.data
Out[142]: <memory at 0x7f827a133b80>

In [143]: modified_array
Out[143]: array([0.2 , 1.3 , 3.14])

In [144]: modified_array.strides
Out[144]: (-8,)

In [145]: modified_array.ctypes.data
Out[145]: 140198189526960

In [146]: modified_array.data.tobytes()
Out[146]: b'\x9d\x99\x99\x99\x99\x99\xc9?\xce\xcc\xcc\xcc\xcc\xcc\xf4?\x1f\x85\xebQ\xb8\x1e\t@'

In [147]: modified_array.base
Out[147]: array([[0.2 , 1.3 , 3.14]])

In [148]: modified_array.base.strides
Out[148]: (24, -8)

In [149]: modified_array.base.ctypes.data
Out[149]: 140198189526960

In [150]: modified_array.base.data.tobytes()
Out[150]: b'\x9d\x99\x99\x99\x99\x99\xc9?\xce\xcc\xcc\xcc\xcc\xcc\xf4?\x1f\x85\xebQ\xb8\x1e\t@'

The base array also has a negative value in strides! I suspect asdf doesn't know how to handle that situation, and I'm not even sure how to interpret it myself.

This seems to have changed in scipy 1.6.0 when the scipy.spatial.transform module was rewritten in cython.

@marscher
Copy link

The from_euler method does a reversing of the matrix in dependence of intrinsic/extrinsic (e.g. "xyz" or "XYZ") argument. This reversing is done like this array[:, ::-1] and causes the stride to be negative. Then the memory is being read "from behind" and the output looks reversed. So I guess asdf should deal with negative strides correctly.

@eslavich
Copy link
Contributor

eslavich commented Feb 22, 2021

I noticed that numpy throws an error when you specify an array buffer with negative strides. Continuing from @vhirtham's script above:

buffer = modified_array.base.base
np.ndarray((3,), buffer=buffer, dtype=np.float64)

BufferError: memoryview: underlying buffer is not contiguous

It seems that scipy is somehow bypassing this check when it constructs the array returned by the as_euler function. If that array is indeed an "illegal" object, then I don't think we should handle it as a special case here. ASDF is capable of serializing a view with negative strides but it makes the assumption that the base array is contiguous has only positive strides.

Even so, we should detect this problem and raise an error from the array serializer. That would be far better than allowing the library to write unreadable files...

EDIT: Just noticed this example in ASDF's test suite:

def test_non_contiguous_base_array(tmpdir):
    base = np.arange(60).reshape(5, 4, 3).transpose(2, 0, 1) * 1
    contiguous = base.transpose(1, 2, 0)
    tree = {'contiguous': contiguous}
    helpers.assert_roundtrip_tree(tree, tmpdir)

So we can handle non-contiguous base arrays, just not a base array with negative strides. I'm going to have to stare at this more tomorrow.

@eslavich
Copy link
Contributor

Reopening this issue because my attempted fix was only part of the puzzle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants