Description
Background
Some colleagues and me were doing some work on sparse
when we stumbled onto a limitation of the current Array API Standard, and @kgryte was kind enough to point out that it might have some wider implications than just sparse
, so it would be prudent to discuss it with other relevant parties within the community before settling on an API design to avoid fragmentation.
Problem Statement
There are two notable things missing from the Array API standard today, which sparse
, and potentially Dask, JAX and other relevant libraries might also need.
- Support for storage formats.
- In Dask, this might be the array metadata, such as the type of the inner array.
- In
sparse
, this would be the format of the sparse array (CRS
,CCS
,COO
, ...).
- Support for lazy arrays/materialization
sparse
/JAX might use this to build up kernels before running a computation- Dask might use this for un-computed arrays stored as a task graph.
Potential solutions
Overload the Array.device
attribute and the Array.to_device
method.
One option is to overload the objects returned/accepted by these to contain a device + storage object. Something like the following:
class Storage:
@property
def device(self) -> Device:
...
@property
def format(self) -> Format:
...
def __eq__(self, other: "Storage") -> bool:
""" Compatible if combined? """
def __ne__(self, other: "Storage") -> bool:
""" Incompatible if combined? """
class Array:
@property
def device(self) -> Storage:
...
def to_device(self, device: Storage, ...) -> "Array":
...
To materialize an array, one could use to_device(default_device())
(possible after #689 is merged).
Advantages
As far as I can see, it's compatible with how the Array API standard works today.
Disadvantages
We're mixing the concepts of an execution context and storage format, and in particular overloading operators in a rather weird way.
Introduce an Array.format
attribute and Array.to_format
method.
Advantages
We can get the API right, maybe even introduce xp.can_mix_formats(...)
.
Disadvantages
Would need to wait till the 2024 revision of the standard at least.
Tagging potentially interested parties:
- @jakirkham @tomwhite for Dask
- @jakevdp for JAX
- Please add anyone I missed