UK Biobank data loader

This repository provides a library and set of utilities for the efficient loading of phenotype and genotype data from the UK Biobank.

Features include:

Loading quantitative and categorical phenotypes, includeding self-reported phenotypes and phenotypes based on ICD-10 disease codes.
Fast parallelized loading that leverages chunked and compressed Zarr arrays.
Utilities for splitting the dataset samples randomly, or based on a predefined structure.

Usage

First, the UKB dataset needs to be converted into the Zarr format with the desired test/train/validation split. For this, use the provided conversion script.

For examples on loading various types of phenotypes, see this example notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
examples		examples
src/ukb_loader		src/ukb_loader
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UK Biobank data loader

Usage

About

Releases

Packages

Contributors 2

Languages

License

alex-medvedev-msc/ukb_loader

Folders and files

Latest commit

History

Repository files navigation

UK Biobank data loader

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages