Skip to content

Principle Components Analysis

Arun Durvasula edited this page Sep 17, 2015 · 4 revisions

NGS PopGen PCA calculation. See NGSPopGen for full details on this method.

To use this method, you would run:

bash ./scripts/PCA.sh ./scripts/PCA.conf

with the proper taxon name filled in the configuration file.

Input files

Scripts

  • scripts/PCA.sh
  • scripts/PCA_TAXON.conf

Necessary input files

  • data/TAXON_samples.txt bam list

Output files

  • results/TAXON_test_PCA.covar (file to graph)
  • results/TAXON_test_PCA.geno

Mandatory PCA.conf Variables

  • TAXON

Optional PCA.conf Variables

  • UNIQUE_ONLY uniquely mapped reads (default=0)
  • MIN_BASEQUAL minimum base quality (default=20)
  • BAQ adjust qscores around indels (as SAMtools) (default=1)
  • MIN_IND minimum number of individuals needed to use site (default=1)
  • MIN_MAPQ minimum base mapping quality to use (default=30)
  • N_CORES number of cores to use (default=32)
  • DO_COUNTS count bases at each site after filtering (default=1)
  • DO_ABBABABA sample a random base at each position (default=1)
  • CHECK_BAM_HEADERS check bam headers (default=0)
  • BLOCKSIZE size of each block. choose a number higher than LD in the populations (default=1000)