-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Description
This is probably a problem with s3fs, but for now it's worth noting here. In practice, we cannot read contigous data from a file opened on an s3fs. Here's some toy code to show the problem:
import s3fs
import pyfive as p5
from pathlib import Path
import numpy as np
S3_URL = "redacted"
S3_BUCKET = 'redacted'
def local(fname,v):
mypath = Path(__file__).parent
f = p5.File(mypath/fname)
d = f[v]
dd = d[:]
return dd
def s3(fname, v):
fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': S3_URL})
uri = S3_BUCKET + '/' + fname
with fs.open(uri,'rb') as s3file2:
f2 = p5.File(s3file2)
d = f2[v]
dd = d[:]
return dd
if __name__=="__main__":
fn, v = 'common_cl_a.nc', 'cl'
d1 = local(fn,v)
d2 = s3(fn,v)
np.testing.assert_array_equal(d1,d2)which gives:
Traceback (most recent call last):
File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 32, in <module>
d2 = s3(fn,v)
File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 25, in s3
dd = d[:]
File "/Users/bnl28/Repositories/pyfive/pyfive/high_level.py", line 279, in __getitem__
data = self._dataobjects.get_data(args)
File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 630, in get_data
return self._get_contiguous_data(self.property_offset)[args]
File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 671, in _get_contiguous_data
return np.memmap(self.fh, dtype=self.dtype, mode='c',
File "/Users/bnl28/mambaforge/envs/fesom/lib/python3.10/site-packages/numpy/core/memmap.py", line 267, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
io.UnsupportedOperation: fileno
This appears to arise from a failure of the object returned by s3fs.F3FileSystem.open to fully respect the requirement to be "file like" - but to be fair, they don't claim feature completeness.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels