Skip to content

building the visualization gallery is slow #3986

Closed
@keewis

Description

@keewis

When running sphinx to build the documentation, it frequently times out when trying to build the visualization gallery. Running

/usr/bin/time -v python -c 'import xarray as xr; xr.open_rasterio("https://github.com/mapbox/rasterio/raw/master/tests/data/RGB.byte.tif")'

reports that it takes at least 5 minutes (or time out after 10 minutes) if opened from the url. Subsequent calls use the cache, so the second rasterio example is fast.

If instead I download the file manually and then load from disk, the whole notebook completes in about 10 seconds. Also, directly calling rasterio.open completes in a few seconds, so the bug should be in open_rasterio.

I do think we should try to fix this in the backend, but maybe we could also cache RGB.byte.tif in the same directory as the xarray.tutorial data and open the cached file in the gallery?

Edit: this is really flaky, I can't reliably reproduce this.

Edit2: for now, I'm using a extra cell containing

import pathlib
import shutil
import requests

cache_dir = pathlib.Path.home() / ".xarray_tutorial_data"
path = cache_dir / "RGB.byte.tif"
url = "https://github.com/mapbox/rasterio/raw/master/tests/data/RGB.byte.tif"

if not path.exists() or path.stat().st_size == 0:
    with requests.get(url) as r, path.open(mode="wb") as f:
        if r.status_code == requests.codes.ok:
            shutil.copyfileobj(r.raw, f)
        else:
            print("download failed: {r.status_code}")
            r.raise_for_status()

url = path

and modify both examples to use the new url

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions