Skip to content

Lazy indexing arrays as a stand-alone package #5081

Open
@shoyer

Description

@shoyer

From @rabernat on Twitter:

"Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516"

The idea here is create a first-class "duck array" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing.

Desired features:

A common feature of these operations is they can (and almost always should) be fused with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea.

Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, mean() probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache.

This is valuable functionality for Xarray for two reasons:

  1. It allows for "previewing" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap "decoding" from its form on disk.
  2. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data.

Related issues:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions