Skip to content

Handling materialization of lazy arrays #748

Open
@hameerabbasi

Description

@hameerabbasi

Background

Some colleagues and me were doing some work on sparse when we stumbled onto a limitation of the current Array API Standard, and @kgryte was kind enough to point out that it might have some wider implications than just sparse, so it would be prudent to discuss it with other relevant parties within the community before settling on an API design to avoid fragmentation.

Problem Statement

There are two notable things missing from the Array API standard today, which sparse, and potentially Dask, JAX and other relevant libraries might also need.

  • Support for storage formats.
    • In Dask, this might be the array metadata, such as the type of the inner array.
    • In sparse, this would be the format of the sparse array (CRS, CCS, COO, ...).
  • Support for lazy arrays/materialization
    • sparse/JAX might use this to build up kernels before running a computation
    • Dask might use this for un-computed arrays stored as a task graph.

Potential solutions

Overload the Array.device attribute and the Array.to_device method.

One option is to overload the objects returned/accepted by these to contain a device + storage object. Something like the following:

class Storage:
    @property
    def device(self) -> Device:
        ...

    @property
    def format(self) -> Format:
        ...

    def __eq__(self, other: "Storage") -> bool:
        """ Compatible if combined? """

    def __ne__(self, other: "Storage") -> bool:
        """ Incompatible if combined? """

class Array:
    @property
    def device(self) -> Storage:
        ...

    def to_device(self, device: Storage, ...) -> "Array":
        ...

To materialize an array, one could use to_device(default_device()) (possible after #689 is merged).

Advantages

As far as I can see, it's compatible with how the Array API standard works today.

Disadvantages

We're mixing the concepts of an execution context and storage format, and in particular overloading operators in a rather weird way.

Introduce an Array.format attribute and Array.to_format method.

Advantages

We can get the API right, maybe even introduce xp.can_mix_formats(...).

Disadvantages

Would need to wait till the 2024 revision of the standard at least.

Tagging potentially interested parties:

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: Lazy/GraphLazy and graph-based array implementations.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions