Welcome to the official repository of ROADIES, a novel pipeline for inferring phylogenetic species trees directly from raw genomic assemblies. ROADIES offers a fully automated, scalable, and easy-to-use solution, eliminating manual steps and allowing flexible control over the trade-off between accuracy and runtime.
For a detailed overview of ROADIES' features and configuration options, please visit our Wiki.
Please follow any of the options below to install ROADIES in your system.
- Install Conda (if not installed):
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
export PATH="$HOME/miniconda3/bin:$PATH"
source ~/.bashrc
- Configure Conda channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Verify the installation by running conda
in your terminal
- Create and activate a custom environment:
conda create -n roadies_env python=3.9 ete3 seaborn
conda activate roadies_env
- Install ROADIES:
conda install roadies
- Locate the installed files:
cd $CONDA_PREFIX/envs/roadies_env/ROADIES
Now you are ready to follow the Quick Start section to run the pipeline.
If you would like to install ROADIES using DockerHub, follow these steps:
- Pull the ROADIES image from DockerHub:
docker pull ang037/roadies:latest
- Launch a container:
docker run -it ang037/roadies:latest
Once you are able to access the ROADIES repository, refer to the Quick Start to run the pipeline.
- Clone the ROADIES repository:
git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES
- Build and run the Docker container:
docker build -t roadies_image .
docker run -it roadies_image
Once you are able to access the ROADIES repository, refer to Quick Start instructions to run the pipeline.
- Install the following dependencies (requires sudo access):
- Java Runtime Environment (Version 1.7 or higher)
- Python (Version 3.9 or higher)
wget
andunzip
commands- GCC (Version 11.4 or higher)
- cmake (Download here: https://cmake.org/download/)
- Boost library (Download here: https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/)
- zlib (Download here: http://www.zlib.net/)
For Ubuntu, you can install these dependencies with:
sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptools git default-jre libgomp1 libboost-all-dev cmake
- Clone the repository:
git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES
- Run the installation script:
chmod +x roadies_env.sh
source roadies_env.sh
After successful setup (Setup complete message), your environment roadies_env will be activated. Proceed to Quick Start.
Note: If you encounter issues with the Boost library, add its path to $CPLUS_LIBRARY_PATH
and save it in ~/.bashrc
.
After installing using one of the options mentioned in Quick Install, you're ready to run ROADIES! To get started:
- Download the test dataset (11 Drosophila genomes):
mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
This will save the datasets on a separate test/test_data
folder within the repository
- Run the pipeline
IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use --noconverge
to run a single iteration.
python run_roadies.py --cores 16 # Full run (multiple iterations)
python run_roadies.py --cores 16 --noconverge # Quick test run (one iteration)
- Output:
- Final UNROOTED newick tree saved as
roadies.nwk
in a separateoutput_files
folder. - Intermediate files (if
--noconverge
not used) saved in a separateconverge_files
folder.
NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided reroot.py
script in workflow/scripts/
(given a reference rooted species tree as input).
If you want to run ROADIES with your own datasets, follow these steps:
- Specify Input Dataset:
- Edit
config.yaml
file (found in the ROADIES directory -config
folder). - Update the
GENOMES
field with paths to your.fa
or.fa.gz
genome assemblies. Ensure all input genomic assemblies are in.fa
or.fa.gz
format and named according to the species' name (e.g.,Aardvark.fa
).
IMPORTANT: Each file must contain only one species. If needed, split multi-species files with:
faSplit byname <input_dir> <output_dir>
- Configure Other Parameters:
- Modify other parameters in
config.yaml
as needed. - Refer to detailed settings on the Wiki.
- Run the Pipeline:
python run_roadies.py --cores 16
Modes of operation: ROADIES supports multiple modes of operation (fast
, balanced
, accurate
) by controlling the accuracy-runtime tradeoff. Use any one of the following commands to select a mode (accurate
mode is the default):
python run_roadies.py --cores 16 --mode accurate
python run_roadies.py --cores 16 --mode balanced
python run_roadies.py --cores 16 --mode fast
The output species tree (unrooted) in Newick format will be saved as roadies.nwk
in the output_files
folder.
For troubleshooting, contributing, or SLURM cluster usage, refer to Wiki
If you use ROADIES in your research or publications, please cite the following paper:
A. Gupta, S. Mirarab, & Y. Turakhia, Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES, Proc. Natl. Acad. Sci. U.S.A. 122 (19) e2500553122, https://doi.org/10.1073/pnas.2500553122 (2025).
The output files with the gene trees and species trees generated by ROADIES in the manuscript are deposited to Dryad. To access it, please refer to the following:
Gupta, Anshu; Mirarab, Siavash; Turakhia, Yatish (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES [Dataset]. Dryad. https://doi.org/10.5061/dryad.tht76hf73.