-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding reader_options
kwargs to open_virtual_dataset.
#67
Conversation
reader_options
kwargs to open_virtual_dataset.reader_options
kwargs to open_virtual_dataset.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
===========================================
- Coverage 90.18% 74.31% -15.87%
===========================================
Files 14 14
Lines 998 946 -52
===========================================
- Hits 900 703 -197
- Misses 98 243 +145
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@norlandrhagen I've merged |
…o allows for reading of cloud storage
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @norlandrhagen ! I think this is a nice minimal addition to support reading from s3. I like how the fsspec stuff is generally kept separate too.
virtualizarr/xarray.py
Outdated
@@ -27,6 +28,7 @@ def open_virtual_dataset( | |||
loadable_variables: Optional[Iterable[str]] = None, | |||
indexes: Optional[Mapping[str, Index]] = None, | |||
virtual_array_class=ManifestArray, | |||
reader_options: Optional[dict] = {'storage_options':{'key':'', 'secret':'', 'anon':True}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you normally point xr.open_dataset
at an S3 url, you don't need to pass reader_options
do you? Can we try to follow the signature of xr.open_dataset
as closely as possible? (Maybe this already is as close as we can get)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but I thought you had to pass in some sort of fsspec/s3fs mapper.
For me this fails:
ds = xr.open_dataset('s3://carbonplan-share/virtualizarr/local.nc')
Looks like the docs + CI builds are failing with:
@TomNicholas should we pin another version of Xarray? |
pydata/xarray#8872 was merged yesterday so now we should be able to release xarray, remove the xarray pin in virtualizarr, then release the first version of virtualizarr! EDIT: Tracking the xarray release pydata/xarray#9018 |
reader_options
kwargs to open_virtual_dataset.reader_options
kwargs to open_virtual_dataset.
This seems very close now @norlandrhagen ? |
CI is passing now @TomNicholas! |
Thanks @norlandrhagen ! One final request: Can we add a quick explanatory line to the docs? Something like
This would go on the usage page, either as a quick entry underneath the first |
Thank you so much @norlandrhagen ! Will merge this now |
Fantastic. Thanks so much. Ill refactor my code in the coming days. |
In a step to start reading remote files #61, this PR adds in
reader_options
to the open_virtual_dataset function.These
reader_options
are passed into each Kerchunk file reader (SingleHdf5ToZarr, NetCDF3ToZarr, etc..) inread_kerchunk_references_from_file
. Onceopen_virtual_dataset
is replaced with the Xarray backend, we could pass them through:ds = xr.open_dataset(fp, engine='virtualizarr', backend_kwargs={'reader_options': {'storage_options': {'anon': True}}})
.This approach relies on the user knowing what options are available in each Kerchunk file reader.
This example should work off of this PR pointing to a public s3 bucket.
edit: Tests are passing. Index generation and filetype guessing are now working.