Alignment with xarray

I'm opening this issue to track and discuss how our data structure differs from xarray. Ideally I would close it when AnnData could easily be implemented via xarray.

*Some previous discussion: #308*

# The idea

I often think of AnnData as a kind of "special case" of xarray Datasets. We just improve convenience by specializing on the 2d case, plus a few other features. It would be nice if I didn't just think of it that way, and we could actually just use their code here.

[sgkit](https://pystatgen.github.io/sgkit/latest/) basically accomplishes this. It basically uses a very "anndata shaped"[^1] xarray Dataset[^2] for representing genomics data. These data structures and our goals with them are so similar that searching for open issues by the sgkit devs on the xarray repository is a great way to find compatibility issues for anndata.

Additionally, zarr and OME-zarr are quite aligned with xarray.

# What's missing

Some things we need, which xarray does not currently provide:

- [ ] We have support for the fast sparse array library (ideally we can get pydata/sparse to become fast)
- [x] We support categorical variables
- [ ] We support repeated dimensions (e.g. `obsp`, `varp`) https://github.com/pydata/xarray/issues/3731
- [ ] We have a nested structure (though [it's on the roadmap](https://xarray.pydata.org/en/stable/roadmap.html#tree-like-data-structure) with [datatree](https://github.com/TomNicholas/datatree) being implemented)
- [ ] We are actively working on support for awkward arrays (https://github.com/theislab/anndata/pull/647 https://github.com/pydata/xarray/issues/4285,  https://github.com/pystatgen/sgkit/issues/643)

[^1]: Since we're in the same language, working with biological data, and using many of the same technologies it would make a lot of sense for us to have greater alignment with sgkit.
[^2]: More context: https://github.com/single-cell-data/matrix-api/issues/11#issuecomment-1072533371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alignment with xarray #744

ivirshup
openedon Mar 23, 2022

The idea

What's missing

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alignment with xarray #744

Description

ivirshupopenedon Mar 23, 2022