Skip to content

If a NetCDF file is chunked on disk, open it with compatible dask chunks #1440

Closed
@Zac-HD

Description

@Zac-HD

NetCDF4 data can be saved as chunks on disk, which has several benefits including efficient reads when using a compatible chunk shape. This is particularly important for files with chunk-based compression (ie all nc4 files with compression) or on HPC and parallel file systems (eg), where IO is typically dominated by the number of reads and chunks-from-disk are often cached. Caches are also common in network data backends such as Thredds OPeNDAP, in which case using disk-compatible chunks will reduce cache pressure as well as latency.

Xarray can use chunks, of course, but as of v0.9 the chunk size has to be specified manually - and the easiest way to discover it is to open the file and look at the _Chunksizes attribute for each variable. I propose that xr.open_dataset (and array, and mfdataset) change their default behaviour.

If Dask is available and chunks=None (the default), chunks should be taken from the file on disk. This may lead to a chunked or unchunked dataset. To force an un-chunked load, users can specify chunks={}, or simple .load() the dataset after opening it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions