This repository contains a complete R-based bioinformatics pipeline for collecting, filtering, aligning, and analyzing acetylcholinesterase (AChE) transcripts across locusts, crickets, and mosquitoes. The workflow identifies the longest isoforms, builds multi-species FASTA files, prepares alignments, constructs phylogenetic trees, and generates figures for research poster or manuscript preparation.
Locusts possess unusually high AChE gene copy numbers yet fail to evolve organophosphate pesticide resistance, unlike mosquitoes. This project explores:
- AChE copy-number differences across key insects
- Whether locust AChE expansion is ancestral or derived
- How gene family size affects resistance evolution
- Whether AChE copies contain known resistance-associated mutations
This repository provides the complete computational workflow for that analysis.
File: scripts/01_download_AChE_sequences.R
Downloads all AChE CDS/mRNA isoforms from NCBI into data/raw/.
File: scripts/02_extract_longest_isoforms.R
Identifies the longest CDS per AChE gene and writes results to data/longest/.
File: scripts/03_merge_longest_fastas.R
Builds a unified multi-species FASTA in data/combined/.
File: scripts/04_run_alignment.R
Prepares or runs MAFFT/Clustal alignment and saves files to data/alignment/.
File: scripts/05_build_tree_fasttree.R
Constructs a phylogenetic tree using FastTree or IQ-TREE, saved to data/tree/.
File: scripts/06_plot_tree.R
Generates polished tree visualizations.
Top-level files
AChE_Project.Rproj— RStudio project fileREADME.md— documentationLICENSE— MIT licenseCITATION.cff— citation metadata.gitignore— ignored files
Folder: scripts/
- 01_download_AChE_sequences.R
- 02_extract_longest_isoforms.R
- 03_merge_longest_fastas.R
- 04_run_alignment.R
- 05_build_tree_fasttree.R
- 06_plot_tree.R
Folder: data/
raw/— raw NCBI downloads (ignored by Git)longest/— longest isoform FASTAs (ignored by Git)combined/— merged multi-species dataset (ignored by Git)alignment/— alignment files (ignored by Git)tree/— final tree outputs (tracked)
Locusts
- Schistocerca gregaria
- Schistocerca cancellata
- Schistocerca piceifrons
Cricket
- Anabrus simplex
Mosquitoes
- Anopheles gambiae
- Aedes aegypti
- Culex quinquefasciatus
R packages:
- rentrez
- seqinr
- dplyr
- stringr
- ape
- ggtree
- ggplot2
- readr
External tools:
- MAFFT — alignment
- FastTree / IQ-TREE — phylogenetics
1. Clone the repository git clone https://github.com/Taylortxtt/AChE_Insect_Transcriptome.git
2. Open the project
- Open
AChE_Project.Rprojin RStudio
3. Run the pipeline Run scripts in numerical order:
1 → 2 → 3 → 4 → 5 → 6
Each script outputs files into the corresponding data/ subfolder.
4. Add new species
- Drop FASTA files into
data/raw/ - Re-run the pipeline starting at script 02 or 03
Taylor M. Johnson
Department of Biochemistry
Mississippi State University
MIT License — see the LICENSE file.
Please cite using the included CITATION.cff file.