This directory contains supporting analysis code for our 2018 microsatellite genotyping paper:
CHIIMP: An automated high‐throughput microsatellite genotyping platform reveals greater allelic diversity in wild chimpanzees.
Barbian, HJ, Connell, AJ, Avitto, AN, et al.
Ecol Evol. 2018; 8: 7946– 7963.
https://doi.org/10.1002/ece3.4302
This code repository as stored on GitHub does not contain data files or the microsatellite analysis program, CHIIMP, itself. These will be downloaded automatically at run-time from the SRA BioProject entry and the program's GitHub page:
The version archived on Dryad includes a snapshot of all code and data as published. In that case the scripts here will use the same input data and program version as used for the published paper to generate a selection of figures presented. The version of CHIIMP included with the Dryad archive is a public domain (CC0) equivalent of version 0.1.0 as available on GitHub.
This code requires a Linux environment with these programs available:
- Common Utilities:
- R (libraries will be installed automatically)
- Pandoc
- SRA Toolkit, if data files not already present
- Cutadapt, if data files not already present
If the requirements are satisfied, running the make
command from within this
directory will run all steps to generate output in the "results" directory. If
make reports "Nothing to be done for 'all'," run make clean
first to remove
any existing output files.
Makefile
: All processing rules to download data and software, run the analysis, and generate figuresfetch_data.sh
: Script to download raw data from the SRA entrydata/
: Sequence filesraw/
: Raw read files as created by the associated MiSeq runsprepared/
: Processed version of the files indata/raw
for use in the analysis
metadata/
: supporting spreadsheets describing the data files and analysislocus_attrs.csv
Microsatellite locus attributesknown_alleles.csv
: Short allele names presented in summariesknown_genotypes.csv
: Genotypes of known individuals used in analysissample_attrs.csv
: A combined sample attributes table, listing all columns submitted to the SRA as well as our own metadata. This is used to prepare dataset spreadsheets during analysis.
chiimp/
: The microsatellite analysis program, CHIIMPresults/
: the output of all analysis herefigures.Rmd
: Post-processing R Markdown script using output of CHIIMP to generate some of the published figures.config/
CHIIMP configuration files for each dataset
The numbering of the datasets in the metadata spreadsheets corresponds to:
- Known Gombe samples
- New Gombe samples
- GME samples, PCR singleplex
- GME samples, PCR multiplex