Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datamodel I/O efficiencies causing error in copy() #5699

Closed
PaddyKavanagh opened this issue Feb 5, 2021 · 7 comments · Fixed by #5787
Closed

Datamodel I/O efficiencies causing error in copy() #5699

PaddyKavanagh opened this issue Feb 5, 2021 · 7 comments · Fixed by #5787

Comments

@PaddyKavanagh
Copy link

There seems to be a datamodel issue with running the ramp output of the development version of MIRISim though any pipeline build >=7.6, which we can't pin down. This occurs when the dq_init step tries to make a copy of the ramp datamodel. The error can be recreated using the following code snipped with the ramp file available here (the trace is at the end of this report):

https://www.dropbox.com/s/nxzgahfveejzhqy/det_image_seq1_MIRIMAGE_F1000Wexp1.fits?dl=0

from jwst import datamodels
dm = datamodels.RampModel('det_image_seq1_MIRIMAGE_F1000Wexp1.fits')
dmc = dm.copy()

I found an issue with a similar error, though not identical, in the following:

#5527

I tried the fix of setting SKIP_FITS_UPDATE environment variable to false to disable some of the I/O efficiencies and this fixes the problem described above so it appears to be related.

==========================================
Traceback (most recent call last):
File "", line 1, in
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/site-packages/jwst/datamodels/model_base.py", line 352, in copy
self.clone(result, self, deepcopy=True, memo=memo)
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/site-packages/jwst/datamodels/model_base.py", line 331, in clone
instance = copy.deepcopy(source._instance, memo=memo)
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/copy.py", line 296, in _reconstruct
value = deepcopy(value, memo)
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/copy.py", line 151, in deepcopy
copier = getattr(x, "deepcopy", None)
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/site-packages/asdf/tags/core/ndarray.py", line 362, in getattr
return getattr(self._make_array(), attr)
File "/Users/patrickkavanagh/anaconda3/anaconda3/envs/miricle.devel/lib/python3.8/site-packages/asdf/tags/core/ndarray.py", line 267, in _make_array
self._array = np.ndarray(
ValueError: strides is incompatible with shape of requested array and size of buffer

@stscijgbot-jp
Copy link
Collaborator

This issue is tracked on JIRA as JP-1911.

@stscijgbot-jp
Copy link
Collaborator

Comment by Jonathan Eisenhamer on JIRA:

Of course I cannot find it, some issue where the error was something like "not an ndarray", but this is symptomatic of a case where, due to lazy loading, an "array", that hasn't been loaded yet, is passed directly to a numpy function, which, of course, numpy has no idea what to do with. The solution there was to force-load the array, but referencing, before the call.

@eslavich
Copy link
Collaborator

The file on dropbox seems to have a bad strides property in its embedded ASDF tree:

data: !core/ndarray-1.0.0
  source: fits:SCI,1
  datatype: float32
  byteorder: big
  shape: [1, 10, 1024, 1032]
  strides: [52838400, 5283840, 4128, 4]

float32 is a 4-byte type, so the strides in the first dimension suggest 52838400 / 4 = 13209600 elements in the array. But by the shape (and the actual number of elements of the array) the number of elements is 10 * 1024 * 1032 = 10567680.

@PaddyKavanagh are you able to give me instructions for creating this file? This smells like a bug in datamodels or asdf, but I haven't been able to reproduce it by creating simple files with the same array.

@eslavich
Copy link
Collaborator

eslavich commented Feb 19, 2021

Never mind, I was able to reproduce without datamodels:

from astropy.io import fits
import asdf
import numpy as np

data = np.zeros((10, 10))
data_view = data[:, :5]

hdul = fits.HDUList([fits.PrimaryHDU(), fits.ImageHDU(data_view)])
with asdf.fits_embed.AsdfInFits(hdulist=hdul) as af:
    af["data"] = hdul[-1].data
    af.write_to("test.fits", overwrite=True)

with asdf.open("test.fits") as af:
    np.array(af["data"])

The asdf library isn't properly serializing an array referenced in the surrounding .fits file that is a view over some larger array.

@stscijgbot-jp
Copy link
Collaborator

Comment by James Davies on JIRA:

So basically this is caused by writing out a Numpy view to a datamodel, and asdf doesn't serialize it correctly?

Is there a reason we should allow writing out views? That seems ripe for inadvertent mistakes.

@eslavich
Copy link
Collaborator

@PaddyKavanagh I have a PR open on the asdf repo with a proposed fix. Can you try regenerating your file with my branch installed?

pip install git+https://github.com/eslavich/asdf.git@JP-1911-fix-array-views

@stscijgbot-jp
Copy link
Collaborator

Comment by Howard Bushouse on JIRA:

Fixed by #5787

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants