ATAC-Seq Analysis Pipeline

A comprehensive, reproducible, and user-friendly pipeline for analyzing ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data from raw FASTQ files to publication-ready results.

📋 Table of Contents

Overview
Features
Quick Start
Requirements
Installation
Usage
Output Files
Quality Control Metrics
Test Data
Citation
Contributing
License
Support

🔬 Overview

ATAC-seq is a method for mapping chromatin accessibility genome-wide. This pipeline automates the entire analysis workflow, from quality control of raw sequencing reads through peak calling, annotation, and visualization.

What this pipeline does:

✅ Quality control of raw sequencing data
✅ Adapter trimming and quality filtering
✅ Alignment to reference genome
✅ Removal of duplicates and mitochondrial reads
✅ Peak calling (accessible chromatin regions)
✅ Blacklist filtering
✅ Peak annotation (genes, promoters, enhancers)
✅ Generation of visualization tracks (BigWig)
✅ Comprehensive quality metrics (FRiP scores)
✅ Statistical summaries and publication-ready plots

✨ Features

🔄 Resume Capability: Pipeline automatically resumes from the last completed step if interrupted
📊 Comprehensive QC: Generates MultiQC reports at multiple stages
🎨 Publication-Ready Plots: Automatically generates 15+ visualizations
📝 Full Reproducibility: Logs all software versions, parameters, and checksums
🛡️ Error Handling: Robust error checking and informative error messages
⚡ Optimized Performance: Multi-threaded processing where possible
🧬 Genome Agnostic: Works with any reference genome (configured for hg19 by default)
📈 Quality Metrics: Calculates FRiP scores, TSS enrichment, fragment size distributions

🚀 Quick Start

# 1. Clone the repository
git clone https://github.com/Adeel3Dgenomics/ATAC-seq-pipeline.git
cd ATAC-seq-pipeline

# 2. Install dependencies (see INSTALL.md for details)
bash scripts/install_dependencies.sh

# 3. Configure your analysis
cp config.example.sh config.sh
nano config.sh  # Edit with your paths

# 4. Run the pipeline
bash ATAC_main.sh

# Or submit to SLURM cluster
sbatch submit_ATAC.sh

📦 Requirements

Software Requirements

Tool	Minimum Version	Purpose
Bash	4.0+	Pipeline execution
FastQC	0.11.9+	Quality control
Trim Galore	0.6.0+	Adapter trimming
Bowtie2	2.3.0+	Read alignment
SAMtools	1.10+	BAM file processing
Picard	2.20.0+	Duplicate removal
Genrich	0.6+	Peak calling
BEDTools	2.29.0+	Genomic interval operations
deepTools	3.3.0+	BigWig generation and plotting
HOMER	4.11+	Peak annotation
featureCounts	2.0.0+	Read counting
MultiQC	1.9+	Report aggregation
R	4.0.0+	Statistical analysis and plotting

R packages: ggplot2, dplyr, tidyr

System Requirements

CPU: 8+ cores recommended
RAM: 32 GB minimum, 64 GB recommended
Storage: ~500 GB per analysis (depends on data size)
OS: Linux/Unix (tested on Ubuntu 20.04, CentOS 7)

See INSTALL.md for detailed installation instructions.

📥 Installation

For detailed installation instructions, see INSTALL.md.

Quick Installation (Ubuntu/Debian)

# Install via conda (recommended)
conda env create -f environment.yml
conda activate atac-seq

# Or use the automated installer
bash scripts/install_dependencies.sh

🔧 Usage

Basic Usage

# Edit configuration
nano config.sh

# Run pipeline
bash ATAC_main.sh

Advanced Usage

# Run on SLURM cluster
sbatch submit_ATAC.sh

# Resume interrupted run
bash ATAC_main.sh  # Automatically detects and resumes

# Clean previous run and start fresh
rm .pipeline_progress
bash ATAC_main.sh

For detailed usage instructions, see USAGE.md.

📂 Output Files

The pipeline generates the following directory structure:

ATAC-seq-analysis/
├── QC/                          # Quality control reports
│   ├── raw/                     # FastQC on raw reads
│   └── trimmed/                 # FastQC on trimmed reads
├── trimmed_data/                # Adapter-trimmed FASTQ files
├── alignment/                   # Aligned BAM files
│   └── dedup/                   # Deduplicated BAM files
├── peaks/                       # Called peaks
├── blacklist_removed/           # Filtered peaks
├── bigwig_tracks/               # Visualization tracks for IGV/UCSC
├── final_results/               # Main results
│   ├── pipeline_summary_stats.tsv
│   ├── ATAC_annotated_homer.txt
│   ├── counts.txt
│   └── frip/
├── plots/                       # All visualizations
│   ├── fragment_size_distribution.pdf
│   ├── TSS_enrichment_profile.pdf
│   ├── correlation_heatmap.pdf
│   └── ... (15+ plots)
├── multiqc_report/              # Integrated QC report
├── reproducibility_log.txt      # Software versions & parameters
└── command_log.txt              # All executed commands

Key Output Files

File	Description
`pipeline_summary_stats.tsv`	Main QC metrics for all samples
`ATAC_peaks.filtered.narrowPeak`	Final peak calls
`ATAC_annotated_homer.txt`	Peak annotations
`*.bw`	BigWig tracks for genome browser
`multiqc_report.html`	Interactive QC report
`frip_scores.txt`	FRiP scores for each sample

See USAGE.md for detailed descriptions of all output files.

📊 Quality Control Metrics

Expected Quality Metrics

Metric	Good	Acceptable	Poor
Mapping Rate	>95%	85-95%	<85%
Duplication Rate	<20%	20-40%	>40%
FRiP Score	>0.3	0.2-0.3	<0.2
Unique Peaks	>40,000	20,000-40,000	<20,000
TSS Enrichment	>7	5-7	<5

Example Output Plots

The pipeline generates publication-ready visualizations. Here are examples from GM12878 cells:

See the test_output/ directory for complete example results:

🧪 Test Data

Test your installation with the provided test dataset:

cd test_data/
bash run_test.sh

This will download a small ATAC-seq dataset and run the complete pipeline. Compare your results to test_output/ to verify correct installation.

🔍 Troubleshooting

Common issues and solutions:

"Command not found" errors: Ensure all dependencies are installed and in PATH
Memory errors: Reduce thread count or increase available RAM
Alignment failures: Check genome index paths in config.sh
No peaks called: Check FRiP scores and library complexity

See TROUBLESHOOTING.md for detailed solutions.

📖 Documentation

INSTALL.md - Detailed installation instructions
USAGE.md - Comprehensive usage guide
TROUBLESHOOTING.md - Common problems and solutions
CHANGELOG.md - Version history and updates

📚 Citation

If you use this pipeline in your research, please cite:

@software{atac_seq_pipeline,
  author = {M.Adeel},
  title = {ATAC-seq Analysis Pipeline},
  year = {2026},
  url = {https://github.com/Adeel3Dgenomics/ATAC-seq-pipeline},
  version = {1.0.0}
}

And the tools used by this pipeline:

ATAC-seq method: Buenrostro et al. (2013) Nature Methods
Genrich: GitHub
HOMER: Heinz et al. (2010) Molecular Cell
See CITATIONS.md for complete list

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

💬 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: Muhammad-Adeel@omrf.org

🙏 Acknowledgments

This pipeline was developed at OMRF. We thank the developers of all the tools integrated into this pipeline.

Keywords: ATAC-seq, chromatin accessibility, epigenomics, NGS, bioinformatics, peak calling, genomics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATAC-Seq Analysis Pipeline

📋 Table of Contents

🔬 Overview

✨ Features

🚀 Quick Start

📦 Requirements

Software Requirements

System Requirements

📥 Installation

Quick Installation (Ubuntu/Debian)

🔧 Usage

Basic Usage

Advanced Usage

📂 Output Files

Key Output Files

📊 Quality Control Metrics

Expected Quality Metrics

Example Output Plots

🧪 Test Data

🔍 Troubleshooting

📖 Documentation

📚 Citation

🤝 Contributing

📄 License

💬 Support

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
test_output		test_output
.gitignore		.gitignore
ATAC_main.sh		ATAC_main.sh
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
USAGE.md		USAGE.md
config.example.sh		config.example.sh
environment.yml		environment.yml
requirements.txt		requirements.txt
submit_ATAC.sh		submit_ATAC.sh

Folders and files

Latest commit

History

Repository files navigation

ATAC-Seq Analysis Pipeline

📋 Table of Contents

🔬 Overview

✨ Features

🚀 Quick Start

📦 Requirements

Software Requirements

System Requirements

📥 Installation

Quick Installation (Ubuntu/Debian)

🔧 Usage

Basic Usage

Advanced Usage

📂 Output Files

Key Output Files

📊 Quality Control Metrics

Expected Quality Metrics

Example Output Plots

🧪 Test Data

🔍 Troubleshooting

📖 Documentation

📚 Citation

🤝 Contributing

📄 License

💬 Support

🙏 Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages