Requirements for UKB GWAS

To run a basic GWAS on UKB data, here are some of the operations we'll need support for:

- [X] bgen reader (https://github.com/pystatgen/sgkit-bgen/pull/1)
- [X] plink reader (https://github.com/pystatgen/sgkit-plink/pull/1, https://github.com/pystatgen/sgkit-plink/issues/6)
- [x] Variant allele frequency/count (https://github.com/pystatgen/sgkit/issues/29)
- [x] Variant call rate/count (https://github.com/pystatgen/sgkit/issues/29)
- [x] Variant HWE test (https://github.com/pystatgen/sgkit/issues/28)
- [x] Sample call rate/count (https://github.com/pystatgen/sgkit/issues/29)
- [ ] An `is_autosome` function to filter variants by
- [x] A function to convert genotype probabilities to hard calls (https://github.com/pystatgen/sgkit/issues/346)
- [X] A linear regression function (https://github.com/pystatgen/sgkit/pull/52)
- [ ] A variant annotation function like [vep](https://hail.is/docs/0.2/methods/genetics.html#hail.methods.vep).  There are plenty of other ways to get this but an internal function would be great.
- [ ] A phenotype normalization pipeline.  I don't expect much of this to become part of sgkit, but there might be some generalizable phenotype-specific functions that are worth considering for inclusion.

There may be a few more beyond that, but I think anything remaining should be reasonable with Xarray/Dask alone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Requirements for UKB GWAS #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Requirements for UKB GWAS #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions