Investigate ragged arrays to represent alleles

Alleles are a challenge to represent efficiently in fixed-length arrays. There are a couple of problems:
1. the number of alleles is not known until the whole VCF file has been processed
2. there can be a very wide variation in the number of alt alleles (most variants will have one, but a few could have thousands

Both these problems could be solved by using ragged arrays.

Zarr has support for [ragged arrays](https://zarr.readthedocs.io/en/stable/tutorial.html#ragged-arrays), but these don't currently work with variable length strings (needed for alleles), and they don't fit the Xarray data model, which assumes fixed sized dimensions. There is a good discussion of the problem in https://github.com/pydata/xarray/issues/4285, in the context of Awkward Array.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate ragged arrays to represent alleles #634

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate ragged arrays to represent alleles #634

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions