A package for windowed PCA analysis. WinPCA performs principal component analyses (PCA) in sliding windows along chromosomes. Both hard-called genotypes (input: VCF or TSV) or genotype likelihoods (input: VCF, TSV or BEAGLE) are accepted. WinPCA uses scikit-allel to perfom PCAs on genotype data and PCAngsd methods for genotype likelihood (GL, PL) data.
WinPCA can aid the initial exploration of new datasets since no prior grouping of input samples is necessary to visualize genetic structure. It has also been used to identify chromosome-scale inversions in cichlids and to visualize the recombination landscape in a species cross (Fig. 2) or to identify ancestry tracts in a hybrid mouse (Fig. 6).
Please ensure to have these dependencies installed and accessible from your current shell environment: Python packages: numpy, pandas, numba, scikit-allel, plotly:
mamba install numpy pandas numba scikit-allel plotly
Additionally, to run WinPCA on genotype likelihood (GL/PL) data: PCAngsd (installation instructions included).
git clone https://github.com/MoritzBlumer/winpca.git # clone github repository
chmod +x winpca/winpca # make excutable
Minimal command line to visualize PC 1 along a chromosome (using GT data from a VCF):
# windowed PCA with default settings
winpca pca VCF_PATH CHROM_NAME:1-CHROM_SIZE PREFIX
# make a plot of principal component 1 and color by inversion state
winpca chromplot PREFIX CHROM_NAME:1-CHROM_SIZE -m METADATA_PATH -g METADATA_COLUMN_NAME
Please refer to the help messages (winpca {method} -h) or to the wiki for the full documentation, file format specifications, more use cases and a tutorial to produce the above plot.
Moritz Blumer: lmb215@cam.ac.uk