Skip to content

Genotype call array to dosage #21

Open
@alimanfoo

Description

@alimanfoo

Raising this issue to discuss implementing a function to convert a genotype call array to a dosage array.

In general, the output array should have at least two dimensions, with the first two dimensions being (variants, samples). The array elements give the dosage of each allele, i.e., how many copies of an allele are carried by the individual.

Some questions for discussion

  • How do we handle biallelic and multiallelic variants?
  • How do we handle missing genotype calls and avoid creating a bias towards a particular allele (e.g., the reference allele)?
  • What should the output dtype be? (int or float or either)? (Some programs think of dosage as a continuous variable.)

Some related functions (not necessarily what we want to copy, but for reference):

Metadata

Metadata

Assignees

No one assigned

    Labels

    core operationsIssues related to domain-specific functionality such as LD pruning, PCA, association testing, etc.data representationIssues related to how data is represented: data types, data structures, indexes, access methods, etc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions