Closed
Description
Is your feature request related to a problem?
Currently the h5netcdf
engine supports opening remote files, but only already open file-like objects (e.g. s3fs.open(...)
), not string paths like s3://...
. There are situations where I'd like to use string paths instead of open file-like objets
- Opening files can sometimes be slow (xref Opening lots of files can be slow fsspec/s3fs#816)
- When using
parallel=True
for opening lots of files, serializing open file-like objects back and forth from a remote cluster can be slow - Some systems (e.g. NASA Earthdata) only hand out credentials that are valid when run in the same region as the data. Being able to use
parallel=True
+storage_options
would be convenient/performant in that case.
Describe the solution you'd like
It would be nice if I could do something like the following:
ds = xr.open_mfdataset(
files, # A bunch of files like `s3://bucket/file`
engine="h5netcdf",
...
parallel=True,
storage_options={...}, # fsspec-compatible options
)
and have my files opened prior to handing off to h5netcdf
. storage_options
is already supported for Zarr, so hopefully extending to h5netcdf
feels natural.
Describe alternatives you've considered
No response
Additional context
No response