Skip to content

Gathering barcoding reference sequences for the Kruger National Park

Notifications You must be signed in to change notification settings

maxfarrell/KNP_refLib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

KNP_refLib

Gathering barcoding reference sequences for the Kruger National Park

Scripts

The two scripts to gather sequences from GenBank and BOLD are "GB_BOLD_seq_download.R" which take a list of Latin binomials and download sequences from the rentrez and bold R packages. These can be modified to download reference sequences for other marker genes. Next "CO1_from_GB_mito_genomes.R" follows the same process, but uses rentrez and modified scripts from the PrimerMiner R package to downloadd whole mitochondrial genomes and extract COI sequences.

After downloading the sequences, "format_GB_BOLD_refLib.sh", "generate_taxonomy.R", and "format_refLib_dada_2.R are used to clean up the downloaded FASTA files, generate the taxonomy file, and format for use with dada2's built-in RDP classifer.

Beyond the custom library, additional scripts are included to format the MIDORI and terrimporter COI reference databases for use with dada2.

Data

Contains downloaded FASTA files, whole mitochondrial genomes, and species lists generated by the Kruger National Park.

Output

The "output" folder contains intermediate files, plus the final dada2-formatted reference sequences:

  • Kingdom to Genus: "Kruger_Vertebrates_refLib_dada2.fasta"
  • Species: "Kruger_Vertebrates_refLib_dada2_species.fasta"
  • Phylum to Species: "Kruger_Vertebrates_refLib_dada2_phy2species.fasta"

About

Gathering barcoding reference sequences for the Kruger National Park

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published