Skip to content

pyfive doesn't play nicely with s3fs #60

@bnlawrence

Description

@bnlawrence

This is probably a problem with s3fs, but for now it's worth noting here. In practice, we cannot read contigous data from a file opened on an s3fs. Here's some toy code to show the problem:

import s3fs
import pyfive as p5
from pathlib import Path
import numpy as np

S3_URL = "redacted"
S3_BUCKET = 'redacted'

def local(fname,v):

    mypath = Path(__file__).parent
    f = p5.File(mypath/fname)
    d = f[v]
    dd = d[:]
    return dd

def s3(fname, v):
        
    fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': S3_URL})
    uri = S3_BUCKET + '/' + fname

    with fs.open(uri,'rb') as s3file2:
        f2 = p5.File(s3file2)
        d = f2[v]
        dd = d[:]
        return dd

if __name__=="__main__":
    fn, v = 'common_cl_a.nc', 'cl'
    d1 = local(fn,v)
    d2 = s3(fn,v)
    np.testing.assert_array_equal(d1,d2)

which gives:

Traceback (most recent call last):
  File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 32, in <module>
    d2 = s3(fn,v)
  File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 25, in s3
    dd = d[:]
  File "/Users/bnl28/Repositories/pyfive/pyfive/high_level.py", line 279, in __getitem__
    data = self._dataobjects.get_data(args)
  File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 630, in get_data
    return self._get_contiguous_data(self.property_offset)[args]
  File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 671, in _get_contiguous_data
    return np.memmap(self.fh, dtype=self.dtype, mode='c',
  File "/Users/bnl28/mambaforge/envs/fesom/lib/python3.10/site-packages/numpy/core/memmap.py", line 267, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
io.UnsupportedOperation: fileno

This appears to arise from a failure of the object returned by s3fs.F3FileSystem.open to fully respect the requirement to be "file like" - but to be fair, they don't claim feature completeness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions