Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to stream data from cloud instead of downloading locally #172

Open
mdtanker opened this issue Feb 18, 2024 · 1 comment
Open

Comments

@mdtanker
Copy link
Owner

mdtanker commented Feb 18, 2024

Currently all of the datasets available withing the fetch module are downloaded and stored on the users local computer using Pooch. As some of the these datasets are large, and as polartoolkit begins to be incorporated into cloud-computing services such as CryoCloud, it would be ideal for users to be able to stream cloud-optimized datasets, instead of having to download the entire datasets.

For now, this is intended just for raster datasets, which are typically supplied as NetCDF (.nc) or GeoTIFF (.tif) files.

It seems that the .zarr file format may be the best file type to work with cloud storage (https://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud).

It seems like Pangeo-Forge is perfectly set up for this, if I understand it correctly.

I will experiment with creating a Pangeo-Forge recipe for Bedmap2 and report back here with how it went.

Note: This extension seems to allow access to EarthData.

Links:

@mdtanker
Copy link
Owner Author

REMA offers access to their data via AWA:
https://registry.opendata.aws/pgc-rema/

This would be a good dataset to test streaming of gridded data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant