A comprehensive, reproducible, and user-friendly pipeline for analyzing ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data from raw FASTQ files to publication-ready results.
- Overview
- Features
- Quick Start
- Requirements
- Installation
- Usage
- Output Files
- Quality Control Metrics
- Test Data
- Citation
- Contributing
- License
- Support
ATAC-seq is a method for mapping chromatin accessibility genome-wide. This pipeline automates the entire analysis workflow, from quality control of raw sequencing reads through peak calling, annotation, and visualization.
What this pipeline does:
- β Quality control of raw sequencing data
- β Adapter trimming and quality filtering
- β Alignment to reference genome
- β Removal of duplicates and mitochondrial reads
- β Peak calling (accessible chromatin regions)
- β Blacklist filtering
- β Peak annotation (genes, promoters, enhancers)
- β Generation of visualization tracks (BigWig)
- β Comprehensive quality metrics (FRiP scores)
- β Statistical summaries and publication-ready plots
- π Resume Capability: Pipeline automatically resumes from the last completed step if interrupted
- π Comprehensive QC: Generates MultiQC reports at multiple stages
- π¨ Publication-Ready Plots: Automatically generates 15+ visualizations
- π Full Reproducibility: Logs all software versions, parameters, and checksums
- π‘οΈ Error Handling: Robust error checking and informative error messages
- β‘ Optimized Performance: Multi-threaded processing where possible
- 𧬠Genome Agnostic: Works with any reference genome (configured for hg19 by default)
- π Quality Metrics: Calculates FRiP scores, TSS enrichment, fragment size distributions
# 1. Clone the repository
git clone https://github.com/Adeel3Dgenomics/ATAC-seq-pipeline.git
cd ATAC-seq-pipeline
# 2. Install dependencies (see INSTALL.md for details)
bash scripts/install_dependencies.sh
# 3. Configure your analysis
cp config.example.sh config.sh
nano config.sh # Edit with your paths
# 4. Run the pipeline
bash ATAC_main.sh
# Or submit to SLURM cluster
sbatch submit_ATAC.sh| Tool | Minimum Version | Purpose |
|---|---|---|
| Bash | 4.0+ | Pipeline execution |
| FastQC | 0.11.9+ | Quality control |
| Trim Galore | 0.6.0+ | Adapter trimming |
| Bowtie2 | 2.3.0+ | Read alignment |
| SAMtools | 1.10+ | BAM file processing |
| Picard | 2.20.0+ | Duplicate removal |
| Genrich | 0.6+ | Peak calling |
| BEDTools | 2.29.0+ | Genomic interval operations |
| deepTools | 3.3.0+ | BigWig generation and plotting |
| HOMER | 4.11+ | Peak annotation |
| featureCounts | 2.0.0+ | Read counting |
| MultiQC | 1.9+ | Report aggregation |
| R | 4.0.0+ | Statistical analysis and plotting |
R packages: ggplot2, dplyr, tidyr
- CPU: 8+ cores recommended
- RAM: 32 GB minimum, 64 GB recommended
- Storage: ~500 GB per analysis (depends on data size)
- OS: Linux/Unix (tested on Ubuntu 20.04, CentOS 7)
See INSTALL.md for detailed installation instructions.
For detailed installation instructions, see INSTALL.md.
# Install via conda (recommended)
conda env create -f environment.yml
conda activate atac-seq
# Or use the automated installer
bash scripts/install_dependencies.sh# Edit configuration
nano config.sh
# Run pipeline
bash ATAC_main.sh# Run on SLURM cluster
sbatch submit_ATAC.sh
# Resume interrupted run
bash ATAC_main.sh # Automatically detects and resumes
# Clean previous run and start fresh
rm .pipeline_progress
bash ATAC_main.shFor detailed usage instructions, see USAGE.md.
The pipeline generates the following directory structure:
ATAC-seq-analysis/
βββ QC/ # Quality control reports
β βββ raw/ # FastQC on raw reads
β βββ trimmed/ # FastQC on trimmed reads
βββ trimmed_data/ # Adapter-trimmed FASTQ files
βββ alignment/ # Aligned BAM files
β βββ dedup/ # Deduplicated BAM files
βββ peaks/ # Called peaks
βββ blacklist_removed/ # Filtered peaks
βββ bigwig_tracks/ # Visualization tracks for IGV/UCSC
βββ final_results/ # Main results
β βββ pipeline_summary_stats.tsv
β βββ ATAC_annotated_homer.txt
β βββ counts.txt
β βββ frip/
βββ plots/ # All visualizations
β βββ fragment_size_distribution.pdf
β βββ TSS_enrichment_profile.pdf
β βββ correlation_heatmap.pdf
β βββ ... (15+ plots)
βββ multiqc_report/ # Integrated QC report
βββ reproducibility_log.txt # Software versions & parameters
βββ command_log.txt # All executed commands
| File | Description |
|---|---|
pipeline_summary_stats.tsv |
Main QC metrics for all samples |
ATAC_peaks.filtered.narrowPeak |
Final peak calls |
ATAC_annotated_homer.txt |
Peak annotations |
*.bw |
BigWig tracks for genome browser |
multiqc_report.html |
Interactive QC report |
frip_scores.txt |
FRiP scores for each sample |
See USAGE.md for detailed descriptions of all output files.
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Mapping Rate | >95% | 85-95% | <85% |
| Duplication Rate | <20% | 20-40% | >40% |
| FRiP Score | >0.3 | 0.2-0.3 | <0.2 |
| Unique Peaks | >40,000 | 20,000-40,000 | <20,000 |
| TSS Enrichment | >7 | 5-7 | <5 |
The pipeline generates publication-ready visualizations. Here are examples from GM12878 cells:
See the test_output/ directory for complete example results:
Test your installation with the provided test dataset:
cd test_data/
bash run_test.shThis will download a small ATAC-seq dataset and run the complete pipeline. Compare your results to test_output/ to verify correct installation.
Common issues and solutions:
- "Command not found" errors: Ensure all dependencies are installed and in PATH
- Memory errors: Reduce thread count or increase available RAM
- Alignment failures: Check genome index paths in config.sh
- No peaks called: Check FRiP scores and library complexity
See TROUBLESHOOTING.md for detailed solutions.
- INSTALL.md - Detailed installation instructions
- USAGE.md - Comprehensive usage guide
- TROUBLESHOOTING.md - Common problems and solutions
- CHANGELOG.md - Version history and updates
If you use this pipeline in your research, please cite:
@software{atac_seq_pipeline,
author = {M.Adeel},
title = {ATAC-seq Analysis Pipeline},
year = {2026},
url = {https://github.com/Adeel3Dgenomics/ATAC-seq-pipeline},
version = {1.0.0}
}And the tools used by this pipeline:
- ATAC-seq method: Buenrostro et al. (2013) Nature Methods
- Genrich: GitHub
- HOMER: Heinz et al. (2010) Molecular Cell
- See CITATIONS.md for complete list
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Muhammad-Adeel@omrf.org
This pipeline was developed at OMRF. We thank the developers of all the tools integrated into this pipeline.
Keywords: ATAC-seq, chromatin accessibility, epigenomics, NGS, bioinformatics, peak calling, genomics



