Skip to content

Wrapping a kerchunk.Array object directly with xarray #8699

Closed
@TomNicholas

Description

What is your issue?

In fsspec/kerchunk#377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using kerchunk.combine.MultiZarrToZarr.

The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores

ds = xr.open_mfdataset(
    '/my/files*.nc'
    engine='kerchunk',  # kerchunk registers an xarray IO backend that returns zarr.Array objects
    combine='nested',  # 'by_coords' would require actually reading coordinate data
    parallel=True,  # would use dask.delayed to generate reference dicts for each file in parallel
)

ds  # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays

ds.kerchunk.to_zarr(store='out.zarr')  # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet)

I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals.

For this to work xarray has to:

  • Wrap a kerchunk.Array object which barely defines any array API methods, including basically not supporting indexing at all,
  • Store all the information present in a kerchunked Zarr store but without ever loading any data,
  • Not create any indexes by default during dataset construction or during xr.concat,
  • Not try to do anything else that can't be defined for a kerchunk.Array.
  • Possibly we need the Lazy Indexing classes to support concatenation Lazy concatenation of arrays #4628

It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions