Genotype call array to dosage

Raising this issue to discuss implementing a function to convert a [genotype call array](https://discourse.smadstatgen.org/t/n-d-array-conventions-for-variation-data/16/4?u=alimanfoo) to a dosage array.

In general, the output array should have at least two dimensions, with the first two dimensions being (variants, samples). The array elements give the dosage of each allele, i.e., how many copies of an allele are carried by the individual. 

Some questions for discussion
* How do we handle biallelic and multiallelic variants?
* How do we handle missing genotype calls and avoid creating a bias towards a particular allele (e.g., the reference allele)?
* What should the output dtype be? (int or float or either)? (Some programs think of dosage as a continuous variable.)

Some related functions (not necessarily what we want to copy, but for reference):
* scikit-allel has [to_n_ref](https://scikit-allel.readthedocs.io/en/stable/model/ndarray.html#allel.Genotypes.to_n_ref), [to_n_alt](https://scikit-allel.readthedocs.io/en/stable/model/ndarray.html#allel.Genotypes.to_n_alt), [to_allele_counts](https://scikit-allel.readthedocs.io/en/stable/model/ndarray.html#allel.Genotypes.to_allele_counts)
* skallel prototype has [genotypes_3d_to_allele_counts](https://github.com/scikit-allel/skallel-tensor/blob/eddd4b1760d9249b60b996730381318dd262c048/src/skallel_tensor/numpy_backend.py#L292), [genotypes_3d_to_allele_counts_melt](https://github.com/scikit-allel/skallel-tensor/blob/eddd4b1760d9249b60b996730381318dd262c048/src/skallel_tensor/numpy_backend.py#L313), [genotypes_3d_to_major_allele_counts](https://github.com/scikit-allel/skallel-tensor/blob/eddd4b1760d9249b60b996730381318dd262c048/src/skallel_tensor/numpy_backend.py#L334)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Genotype call array to dosage #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Genotype call array to dosage #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions