Add S3 (with Minio) netCDF open/read and processing and exploratory tests#89
Add S3 (with Minio) netCDF open/read and processing and exploratory tests#89valeriupredoi merged 61 commits intomainfrom
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #89 +/- ##
==========================================
- Coverage 87.21% 86.04% -1.17%
==========================================
Files 8 8
Lines 485 516 +31
==========================================
+ Hits 423 444 +21
- Misses 62 72 +10
☔ View full report in Codecov by Sentry. |
|
gonna merge this since it's getting a bit too big for a regular PR and commit fluff keeps accruing - the only outstanding issue about it is the |
First instance of a functional, end-to-end test of running PyActiveStorage on netCDF files on S3: I used Minio as S3-type storage with @sd109 having confirmed that that thing is identical to AWS S3 (cheers!). The workflow does this:
Activenow needs a new kwargstorage_typewhich could beNoneor could bes3at the mos3then it goes about and does the two file loads that are currently done in its bellows (ie open file, to get metadata/headers, NOT data) via a dedicated S3 mechanism that usess3fs(note that fsspec does that too, fsspec calls s3fs when it recognized the FS to be S3, so no point using fsspec); then it goes about and usesh5netcdfto put the open file (which is nothing more than a memory view of the netCDF file) into an hdf5/netCDF-like object formats3_reduce_chunk()func that @markgoddard put in the previous PR, and from there on, Active works as per normalLazy transfer from s3fs
@bnlawrence has investigated the play between fsspec and s3fs via h5netcdf and found out thatwhen pulling the file from S3 via s3fs and convert-opening it with h5netcdf things get done lazily ie no actual data transfer, but only file's metadata headers. Cheers, Bryan!