Skip to content

veg/busted-ph-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BUSTED-PH Cluster Workflow

This repository contains a standardized workflow for running BUSTED-PH on high-performance computing clusters (specifically optimized for SLURM and Snakemake).

Directory Structure

.
├── alignments/          # Input: .fasta or .nex files
├── data/
│   ├── global_tree.nwk  # Input: Master species tree
│   └── foreground.txt   # Input: List of phenotype-positive species
├── nexus_files/         # Intermediate: Prepared NEXUS files (alignment + tree)
├── results/             # Output: BUSTED-PH JSON results
├── scripts/             
│   ├── prepare_data.sh  # Step 1: Data preparation script
│   ├── run_busted.sh    # Step 2a: SLURM job array script
│   └── parse_results.py # Step 3: SQLite parsing script
├── Snakefile            # Step 2b: Snakemake workflow definition
├── logs/                # Cluster log files
└── busted_ph_results.db # Final: SQLite database of results

Setup

  1. Place your alignments in alignments/.
  2. Place your master tree and foreground list in data/.
  3. Follow Step 1 below to prepare data.

Step 1: Data Preparation

This script reconciles your alignments with the master tree, trims missing species, and labels foreground branches using the Conjunctive (All Descendants) strategy.

Note: The script automatically replaces any * characters in your FASTA files with - to ensure compatibility with HyPhy.

bash scripts/prepare_data.sh

(Optional: You can edit the paths and thread count at the top of the script.)

Step 2a: Run with SLURM Job Array

# Update --array index in scripts/run_busted.sh first
sbatch scripts/run_busted.sh

Step 2b: Run with Snakemake (Recommended)

snakemake --cluster "sbatch --cpus-per-task=4 --mem=4G --time=48:00:00 --output=logs/snakemake_%j.out" -j 50 --delete-incomplete

Step 3: Parsing Results

python scripts/parse_results.py

Query the SQLite database:

sqlite3 busted_ph_results.db "SELECT gene FROM results WHERE status = 'SIGNIFICANT_ASSOCIATION' ORDER BY p_diff ASC"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published