Skip to content

trouble loading netcdf4 files with xarray on s3 #168

@scottyhq

Description

@scottyhq

I'm working on allowing direct access to netcdf4/hdf5 file-like objects (pydata/xarray#2782). This seems to be working fine with gcsfs, but not s3fs (versions 0.2 from conda-forge). Here is a gist with the relevant code and error traceback:

https://gist.github.com/scottyhq/304a3c4b4e198776b8d82fb3a9f300e3

and an abbreviated traceback here:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Documents/GitHub/xarray/xarray/backends/file_manager.py in acquire(self, needs_lock)
    166             try:
--> 167                 file = self._cache[self._key]
    168             except KeyError:

~/Documents/GitHub/xarray/xarray/backends/lru_cache.py in __getitem__(self, key)
     40         with self._lock:
---> 41             value = self._cache[key]
     42             self._cache.move_to_end(key)

KeyError: [<function _open_h5netcdf_group at 0x11d8b0ae8>, (<S3File grfn-content-prod/S1-GUNW-A-R-137-tops-20181129_20181123-020010-43220N_41518N-PP-e2c7-v2_0_0.nc>,), 'r', (('group', '/science/grids/data'),)]

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
h5py/h5fd.pyx in h5py.h5fd.H5FD_fileobj_read()

~/miniconda3/envs/test_env/lib/python3.6/site-packages/s3fs/core.py in readinto(self, b)
   1498         data = self.read()
-> 1499         b[:len(data)] = data
   1500         return len(data)

~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView.memoryview.__setitem__()

~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView.memoryview.setitem_slice_assignment()

~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView.memoryview_copy_contents()

~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView._err_extents()

ValueError: got differing extents in dimension 0 (got 8 and 59941567)

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
h5py/h5fd.pyx in h5py.h5fd.H5FD_fileobj_read()

~/miniconda3/envs/test_env/lib/python3.6/site-packages/s3fs/core.py in seek(self, loc, whence)
   1235         """
-> 1236         if not self.readable():
   1237             raise ValueError('Seek only available in read mode')

SystemError: PyEval_EvalFrameEx returned a result with an error set

any guidance as to what might be going on here would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions