This repository contains a standardized workflow for running BUSTED-PH on high-performance computing clusters (specifically optimized for SLURM and Snakemake).
.
├── alignments/ # Input: .fasta or .nex files
├── data/
│ ├── global_tree.nwk # Input: Master species tree
│ └── foreground.txt # Input: List of phenotype-positive species
├── nexus_files/ # Intermediate: Prepared NEXUS files (alignment + tree)
├── results/ # Output: BUSTED-PH JSON results
├── scripts/
│ ├── prepare_data.sh # Step 1: Data preparation script
│ ├── run_busted.sh # Step 2a: SLURM job array script
│ └── parse_results.py # Step 3: SQLite parsing script
├── Snakefile # Step 2b: Snakemake workflow definition
├── logs/ # Cluster log files
└── busted_ph_results.db # Final: SQLite database of results
- Place your alignments in
alignments/. - Place your master tree and foreground list in
data/. - Follow Step 1 below to prepare data.
This script reconciles your alignments with the master tree, trims missing species, and labels foreground branches using the Conjunctive (All Descendants) strategy.
Note: The script automatically replaces any * characters in your FASTA files with - to ensure compatibility with HyPhy.
bash scripts/prepare_data.sh(Optional: You can edit the paths and thread count at the top of the script.)
# Update --array index in scripts/run_busted.sh first
sbatch scripts/run_busted.shsnakemake --cluster "sbatch --cpus-per-task=4 --mem=4G --time=48:00:00 --output=logs/snakemake_%j.out" -j 50 --delete-incompletepython scripts/parse_results.pyQuery the SQLite database:
sqlite3 busted_ph_results.db "SELECT gene FROM results WHERE status = 'SIGNIFICANT_ASSOCIATION' ORDER BY p_diff ASC"