Skip to content

Precursor step: local ancestry deconvolution

Elizabeth Atkinson edited this page Dec 16, 2020 · 1 revision

Tractor ingests local ancestry calls assuming an RFmix (or XGmix) input format, and exports ancestry-specific dosage files for minor alleles, haplotypes, and genotypes. RFmixv2 implementation, flags and required file input formats are described in detail on the github page (https://github.com/slowkoni/rfmix/blob/master/MANUAL.md) and in their manuscript.

An example of a command to run RFmix for a single chromosome using a subset of the Thousand Genomes project as reference and the HapMap combined recombination map:

rfmix -f cohort.vcf -r AFR_EUR.phase3_shapeit2_mvncall_integrated_v5a.20130502.chr$i.vcf.gz --chromosome=${i} -m AFR_EUR.1kg_order.indivs.pops.txt -g HapMapcomb_genmap_chr${i}.txt -e 1 -n 5 -o cohort.rfmix.chr$i

In testing on simulated admixed populations of the Americas (African American and Latinx models), we have found running one EM iteration (-e 1) improved results, but that additional iterations only had marginal gains such that they were not worth the additional runtime. We used the -n flag to account for any sample size unbalance.

RFmix will output several files, as described in their manual; the .msp.tsv file contains the most likely ancestry calls for each person in each genomic window and is what is directly read by Tractor.

Clone this wiki locally