-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache rasterio example files #4102
Merged
Merged
Changes from 56 commits
Commits
Show all changes
58 commits
Select commit
Hold shift + click to select a range
a8b0022
add a open_rasterio function to tutorial
keewis 3997c27
put the cache directory into .cache if that exists
keewis b45e28d
raise an error if the status code is not 200
keewis e972d84
use the cached file if possible
keewis 7b5e6d5
add a test to check that the caching does not affect the result
keewis e435103
use the new tutorial function in the visualization gallery
keewis 75dac85
Merge branch 'master' into refactor-tutorial
keewis 8fe0c92
fix the temporary directory creation
keewis 783baeb
rewrite open_dataset to use the same functions as open_rasterio
keewis d6623ff
make sure the context manager on a pathlib object always works
keewis f834883
require requests
keewis 5ddadab
add requests to most CI
keewis 21f090d
split into two context managers
keewis 068b660
use is_dir instead of exists to check for .cache
keewis c5d1490
reword a few comments
keewis b457c19
properly credit the SO answer
keewis c9faac1
add a pseudo-atomic open function
keewis f16a15f
add a random part to the file so concurrent calls are not an issue
keewis f9abf31
vendor appdirs.user_cache_dir and use it to determine the default cache
keewis c276876
properly vendor appdirs
keewis fdb9b11
suppress FileNotFoundErrors while removing a file
keewis 427b4cf
silence mypy
keewis bd2cafc
make sure to convert string paths to pathlib.Path
keewis 9d4bce3
convert the result of appdirs.user_cache_dir to pathlib.Path
keewis 4297920
add the comment about switching to unlink(missing_ok=True)
keewis 9fde5d6
Merge branch 'master' into refactor-tutorial
keewis adfce27
use requests.codes.ok instead of the numeric value
keewis ea9d4dc
remove the md5 checking code
keewis d828720
Merge branch 'master' into refactor-tutorial
keewis 745de23
try to make the comment clearer
keewis c1229ae
Merge branch 'master' into refactor-tutorial
keewis 29b78ed
typo
keewis 8d04bba
isort
keewis dd08972
Merge branch 'master' into refactor-tutorial
keewis 77e2487
remove all code related to the detection of the application directory
keewis e8b4a00
Merge branch 'master' into refactor-tutorial
keewis f730a85
Merge branch 'master' into refactor-tutorial
keewis 39e7d77
use pooch for caching and fetching the files
keewis 9134d7a
remove requests from the CI environments
keewis fa89822
add pooch to the environment used by the py38-flaky CI
keewis 56d7e49
remove the install_requires on requests and the vendor mypy ignore [s…
keewis b848c66
add pooch to the doc environment [skip-ci]
keewis dab96de
Merge branch 'master' into refactor-tutorial
keewis 405a90e
ignore missing type hints for pooch
keewis cc6cade
Merge branch 'master' into refactor-tutorial
keewis 817a6ba
add a mapping of external urls
keewis 0b60eb0
remove tutorial.open_rasterio
keewis d82b816
remove the github_url and branch PRs
keewis 719765c
allow opening rasterio files using open_dataset
keewis 0a489c4
remove the reference to xarray.tutorial.open_dataset
keewis 0335473
rename engine_overrides to overrides
keewis cc3eb3c
update the docstring
keewis 04e15fb
update the rasterio test
keewis 6f02db4
use explicitly passed values for engine
keewis 653e5fd
use open_dataset instead of open_rasterio
keewis b250b38
convert back to a data array [skip-ci]
keewis 78c11f5
write the files to a temporary cache directory
keewis b532879
typo
keewis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,7 @@ dependencies: | |
- pandas | ||
- pint | ||
- pip=20.2 | ||
- pooch | ||
- pre-commit | ||
- pseudonetcdf | ||
- pydap | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,33 +5,45 @@ | |
* building tutorials in the documentation. | ||
""" | ||
import hashlib | ||
import os as _os | ||
from urllib.request import urlretrieve | ||
import os | ||
import pathlib | ||
|
||
import numpy as np | ||
|
||
from .backends.api import open_dataset as _open_dataset | ||
from .backends.rasterio_ import open_rasterio | ||
from .core.dataarray import DataArray | ||
from .core.dataset import Dataset | ||
|
||
_default_cache_dir = _os.sep.join(("~", ".xarray_tutorial_data")) | ||
|
||
def _open_rasterio(path, engine=None, **kwargs): | ||
data = open_rasterio(path, **kwargs) | ||
name = data.name if data.name is not None else "data" | ||
return data.to_dataset(name=name) | ||
|
||
def file_md5_checksum(fname): | ||
hash_md5 = hashlib.md5() | ||
with open(fname, "rb") as f: | ||
hash_md5.update(f.read()) | ||
return hash_md5.hexdigest() | ||
|
||
_default_cache_dir_name = "xarray_tutorial_data" | ||
base_url = "https://github.com/pydata/xarray-data" | ||
version = "master" | ||
|
||
|
||
external_urls = { | ||
"RGB.byte": ( | ||
"rasterio", | ||
"https://github.com/mapbox/rasterio/raw/master/tests/data/RGB.byte.tif", | ||
), | ||
} | ||
overrides = { | ||
"rasterio": _open_rasterio, | ||
} | ||
|
||
|
||
# idea borrowed from Seaborn | ||
def open_dataset( | ||
name, | ||
engine=None, | ||
cache=True, | ||
cache_dir=_default_cache_dir, | ||
github_url="https://github.com/pydata/xarray-data", | ||
branch="master", | ||
cache_dir=None, | ||
**kws, | ||
): | ||
""" | ||
|
@@ -42,61 +54,62 @@ def open_dataset( | |
Parameters | ||
---------- | ||
name : str | ||
Name of the file containing the dataset. If no suffix is given, assumed | ||
to be netCDF ('.nc' is appended) | ||
Name of the file containing the dataset. | ||
e.g. 'air_temperature' | ||
cache_dir : str, optional | ||
engine : str, optional | ||
The engine to use. | ||
cache_dir : path-like, optional | ||
The directory in which to search for and write cached data. | ||
cache : bool, optional | ||
If True, then cache data locally for use on subsequent calls | ||
github_url : str | ||
Github repository where the data is stored | ||
branch : str | ||
The git branch to download from | ||
kws : dict, optional | ||
Passed to xarray.open_dataset | ||
Notes | ||
----- | ||
Available datasets: | ||
* ``"air_temperature"`` | ||
* ``"rasm"`` | ||
* ``"ROMS_example"`` | ||
* ``"tiny"`` | ||
* ``"era5-2mt-2019-03-uk.grib"`` | ||
* ``"RGB.byte"``: example rasterio file from https://github.com/mapbox/rasterio | ||
Comment on lines
+72
to
+77
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure if It would also be good to add a small description of the data. |
||
See Also | ||
-------- | ||
xarray.open_dataset | ||
""" | ||
root, ext = _os.path.splitext(name) | ||
if not ext: | ||
ext = ".nc" | ||
fullname = root + ext | ||
longdir = _os.path.expanduser(cache_dir) | ||
localfile = _os.sep.join((longdir, fullname)) | ||
md5name = fullname + ".md5" | ||
md5file = _os.sep.join((longdir, md5name)) | ||
|
||
if not _os.path.exists(localfile): | ||
|
||
# This will always leave this directory on disk. | ||
# May want to add an option to remove it. | ||
if not _os.path.isdir(longdir): | ||
_os.mkdir(longdir) | ||
|
||
url = "/".join((github_url, "raw", branch, fullname)) | ||
urlretrieve(url, localfile) | ||
url = "/".join((github_url, "raw", branch, md5name)) | ||
urlretrieve(url, md5file) | ||
|
||
localmd5 = file_md5_checksum(localfile) | ||
with open(md5file) as f: | ||
remotemd5 = f.read() | ||
if localmd5 != remotemd5: | ||
_os.remove(localfile) | ||
msg = """ | ||
MD5 checksum does not match, try downloading dataset again. | ||
""" | ||
raise OSError(msg) | ||
|
||
ds = _open_dataset(localfile, **kws) | ||
|
||
try: | ||
import pooch | ||
except ImportError: | ||
raise ImportError("using the tutorial data requires pooch") | ||
|
||
if isinstance(cache_dir, pathlib.Path): | ||
cache_dir = os.fspath(cache_dir) | ||
elif cache_dir is None: | ||
cache_dir = pooch.os_cache(_default_cache_dir_name) | ||
|
||
if name in external_urls: | ||
engine_, url = external_urls[name] | ||
if engine is None: | ||
engine = engine_ | ||
else: | ||
# process the name | ||
default_extension = ".nc" | ||
path = pathlib.Path(name) | ||
if not path.suffix: | ||
path = path.with_suffix(default_extension) | ||
|
||
url = f"{base_url}/raw/{version}/{path.name}" | ||
|
||
_open = overrides.get(engine, _open_dataset) | ||
# retrieve the file | ||
filepath = pooch.retrieve(url=url, known_hash=None, path=cache_dir) | ||
ds = _open(filepath, engine=engine, **kws) | ||
if not cache: | ||
ds = ds.load() | ||
_os.remove(localfile) | ||
pathlib.Path(filepath).unlink() | ||
|
||
return ds | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once we switch to versioned data (i.e. set
version
to a tag) we might want to do that for this url, too. Then, we should also be able to specify hash values and havepooch
check them automatically.