This repository contains a reproducible pipeline for detecting viral sequences from sequencing data.
It can be run using Docker (recommended for full reproducibility) or alternatively with Conda if Docker is unavailable.
- Overview
- Option 1: Run with Docker (Recommended)
- Option 2: Run with Conda
- Input Configuration
- Output
- Updating or Removing Environments
This pipeline performs the following key steps:
- Quality control and alignment of sequencing reads
- Viral sequence identification using
bowtie2andSTAR - Variant calling with
bcftools - Optional downstream analysis and summary statistics
The codebase supports:
- Docker for fully containerized execution
- Conda for systems where Docker cannot be used
Docker ensures complete reproducibility with all dependencies pre-installed.
- Docker installed
- Internet connection for image building (first time only)
git clone https://github.com/nickcjacobs/ViralDetection.git
cd ViralDetection
From the repository root:
docker build -t viral_detection .
docker run --rm -v $(pwd):/app -w /app viral_detection \
bash bin/viral_detection.sh config/pipeline_input.txt
Explanation:
-v $(pwd):/app mounts your current directory into the container
-w /app sets the working directory inside the container
The pipeline reads parameters from config/pipeline_input.txt
If Docker is not available, you can run the same pipeline in a Conda environment.
Miniconda or Mambaforge
git clone https://github.com/nickcjacobs/ViralDetection.git
cd ViralDetection
conda env create -f environment.yml
conda activate viral_detection
This installs:
samtools, bcftools, bowtie2, STAR, seqtk, parallel, pysam, and other required tools.
bash bin/viral_detection.sh config/pipeline_input.txt
All input file paths and settings are specified in:
config/pipeline_input.txt
Ensure this file includes the correct paths to your FASTQ files, reference genome, and other required inputs before running the pipeline.
The pipeline produces:
Processed and aligned reads
Detected viral sequences
Variant calls (.vcf files)
Summary and log files in the designated output folder
Output locations and naming conventions are controlled by your configuration file.
If you modify environment.yml and want to apply updates:
conda env update -f environment.yml --prune
To remove the Conda environment entirely:
conda remove --name viral_detection --all
Pull requests are welcome! If you’d like to add new features or improve existing ones:
Fork this repository
Create a feature branch
Submit a pull request describing your changes
For major updates, please open an issue first to discuss proposed modifications.
Maintainer: Nick Jacobs Repository: github.com/KlugerLab/ViralDetection