Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building the visualization gallery is slow #3986

Closed
keewis opened this issue Apr 20, 2020 · 2 comments · Fixed by #4102
Closed

building the visualization gallery is slow #3986

keewis opened this issue Apr 20, 2020 · 2 comments · Fixed by #4102

Comments

@keewis
Copy link
Collaborator

keewis commented Apr 20, 2020

When running sphinx to build the documentation, it frequently times out when trying to build the visualization gallery. Running

/usr/bin/time -v python -c 'import xarray as xr; xr.open_rasterio("https://github.com/mapbox/rasterio/raw/master/tests/data/RGB.byte.tif")'

reports that it takes at least 5 minutes (or time out after 10 minutes) if opened from the url. Subsequent calls use the cache, so the second rasterio example is fast.

If instead I download the file manually and then load from disk, the whole notebook completes in about 10 seconds. Also, directly calling rasterio.open completes in a few seconds, so the bug should be in open_rasterio.

I do think we should try to fix this in the backend, but maybe we could also cache RGB.byte.tif in the same directory as the xarray.tutorial data and open the cached file in the gallery?

Edit: this is really flaky, I can't reliably reproduce this.

Edit2: for now, I'm using a extra cell containing

import pathlib
import shutil
import requests

cache_dir = pathlib.Path.home() / ".xarray_tutorial_data"
path = cache_dir / "RGB.byte.tif"
url = "https://github.com/mapbox/rasterio/raw/master/tests/data/RGB.byte.tif"

if not path.exists() or path.stat().st_size == 0:
    with requests.get(url) as r, path.open(mode="wb") as f:
        if r.status_code == requests.codes.ok:
            shutil.copyfileobj(r.raw, f)
        else:
            print("download failed: {r.status_code}")
            r.raise_for_status()

url = path

and modify both examples to use the new url

@dcherian
Copy link
Contributor

maybe we could also cache RGB.byte.tif in the same directory as the xarray.tutorial data and open the cached file in the gallery?

Sounds OK to me especially if it reduces time

@keewis
Copy link
Collaborator Author

keewis commented May 15, 2020

should I extend open_dataset to also open rasterio files or would it be better to add a open_rasterio function that does that? To match the normal API, I'd go with the latter and change it when / if we switch to open_dataset(..., format="rasterio")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants