Skip to content

Adeel3Dgenomics/ATAC-Seq-Analysis-Smooth-Ride

Repository files navigation

ATAC-Seq Analysis Pipeline

License: MIT Bash DOI

A comprehensive, reproducible, and user-friendly pipeline for analyzing ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data from raw FASTQ files to publication-ready results.

πŸ“‹ Table of Contents

πŸ”¬ Overview

ATAC-seq is a method for mapping chromatin accessibility genome-wide. This pipeline automates the entire analysis workflow, from quality control of raw sequencing reads through peak calling, annotation, and visualization.

What this pipeline does:

  1. βœ… Quality control of raw sequencing data
  2. βœ… Adapter trimming and quality filtering
  3. βœ… Alignment to reference genome
  4. βœ… Removal of duplicates and mitochondrial reads
  5. βœ… Peak calling (accessible chromatin regions)
  6. βœ… Blacklist filtering
  7. βœ… Peak annotation (genes, promoters, enhancers)
  8. βœ… Generation of visualization tracks (BigWig)
  9. βœ… Comprehensive quality metrics (FRiP scores)
  10. βœ… Statistical summaries and publication-ready plots

✨ Features

  • πŸ”„ Resume Capability: Pipeline automatically resumes from the last completed step if interrupted
  • πŸ“Š Comprehensive QC: Generates MultiQC reports at multiple stages
  • 🎨 Publication-Ready Plots: Automatically generates 15+ visualizations
  • πŸ“ Full Reproducibility: Logs all software versions, parameters, and checksums
  • πŸ›‘οΈ Error Handling: Robust error checking and informative error messages
  • ⚑ Optimized Performance: Multi-threaded processing where possible
  • 🧬 Genome Agnostic: Works with any reference genome (configured for hg19 by default)
  • πŸ“ˆ Quality Metrics: Calculates FRiP scores, TSS enrichment, fragment size distributions

πŸš€ Quick Start

# 1. Clone the repository
git clone https://github.com/Adeel3Dgenomics/ATAC-seq-pipeline.git
cd ATAC-seq-pipeline

# 2. Install dependencies (see INSTALL.md for details)
bash scripts/install_dependencies.sh

# 3. Configure your analysis
cp config.example.sh config.sh
nano config.sh  # Edit with your paths

# 4. Run the pipeline
bash ATAC_main.sh

# Or submit to SLURM cluster
sbatch submit_ATAC.sh

πŸ“¦ Requirements

Software Requirements

Tool Minimum Version Purpose
Bash 4.0+ Pipeline execution
FastQC 0.11.9+ Quality control
Trim Galore 0.6.0+ Adapter trimming
Bowtie2 2.3.0+ Read alignment
SAMtools 1.10+ BAM file processing
Picard 2.20.0+ Duplicate removal
Genrich 0.6+ Peak calling
BEDTools 2.29.0+ Genomic interval operations
deepTools 3.3.0+ BigWig generation and plotting
HOMER 4.11+ Peak annotation
featureCounts 2.0.0+ Read counting
MultiQC 1.9+ Report aggregation
R 4.0.0+ Statistical analysis and plotting

R packages: ggplot2, dplyr, tidyr

System Requirements

  • CPU: 8+ cores recommended
  • RAM: 32 GB minimum, 64 GB recommended
  • Storage: ~500 GB per analysis (depends on data size)
  • OS: Linux/Unix (tested on Ubuntu 20.04, CentOS 7)

See INSTALL.md for detailed installation instructions.

πŸ“₯ Installation

For detailed installation instructions, see INSTALL.md.

Quick Installation (Ubuntu/Debian)

# Install via conda (recommended)
conda env create -f environment.yml
conda activate atac-seq

# Or use the automated installer
bash scripts/install_dependencies.sh

πŸ”§ Usage

Basic Usage

# Edit configuration
nano config.sh

# Run pipeline
bash ATAC_main.sh

Advanced Usage

# Run on SLURM cluster
sbatch submit_ATAC.sh

# Resume interrupted run
bash ATAC_main.sh  # Automatically detects and resumes

# Clean previous run and start fresh
rm .pipeline_progress
bash ATAC_main.sh

For detailed usage instructions, see USAGE.md.

πŸ“‚ Output Files

The pipeline generates the following directory structure:

ATAC-seq-analysis/
β”œβ”€β”€ QC/                          # Quality control reports
β”‚   β”œβ”€β”€ raw/                     # FastQC on raw reads
β”‚   └── trimmed/                 # FastQC on trimmed reads
β”œβ”€β”€ trimmed_data/                # Adapter-trimmed FASTQ files
β”œβ”€β”€ alignment/                   # Aligned BAM files
β”‚   └── dedup/                   # Deduplicated BAM files
β”œβ”€β”€ peaks/                       # Called peaks
β”œβ”€β”€ blacklist_removed/           # Filtered peaks
β”œβ”€β”€ bigwig_tracks/               # Visualization tracks for IGV/UCSC
β”œβ”€β”€ final_results/               # Main results
β”‚   β”œβ”€β”€ pipeline_summary_stats.tsv
β”‚   β”œβ”€β”€ ATAC_annotated_homer.txt
β”‚   β”œβ”€β”€ counts.txt
β”‚   └── frip/
β”œβ”€β”€ plots/                       # All visualizations
β”‚   β”œβ”€β”€ fragment_size_distribution.pdf
β”‚   β”œβ”€β”€ TSS_enrichment_profile.pdf
β”‚   β”œβ”€β”€ correlation_heatmap.pdf
β”‚   └── ... (15+ plots)
β”œβ”€β”€ multiqc_report/              # Integrated QC report
β”œβ”€β”€ reproducibility_log.txt      # Software versions & parameters
└── command_log.txt              # All executed commands

Key Output Files

File Description
pipeline_summary_stats.tsv Main QC metrics for all samples
ATAC_peaks.filtered.narrowPeak Final peak calls
ATAC_annotated_homer.txt Peak annotations
*.bw BigWig tracks for genome browser
multiqc_report.html Interactive QC report
frip_scores.txt FRiP scores for each sample

See USAGE.md for detailed descriptions of all output files.

πŸ“Š Quality Control Metrics

Expected Quality Metrics

Metric Good Acceptable Poor
Mapping Rate >95% 85-95% <85%
Duplication Rate <20% 20-40% >40%
FRiP Score >0.3 0.2-0.3 <0.2
Unique Peaks >40,000 20,000-40,000 <20,000
TSS Enrichment >7 5-7 <5

Example Output Plots

The pipeline generates publication-ready visualizations. Here are examples from GM12878 cells:

Fragment Size Distribution TSS Enrichment

Sample Correlation QC Metrics

See the test_output/ directory for complete example results:

πŸ§ͺ Test Data

Test your installation with the provided test dataset:

cd test_data/
bash run_test.sh

This will download a small ATAC-seq dataset and run the complete pipeline. Compare your results to test_output/ to verify correct installation.

πŸ” Troubleshooting

Common issues and solutions:

  • "Command not found" errors: Ensure all dependencies are installed and in PATH
  • Memory errors: Reduce thread count or increase available RAM
  • Alignment failures: Check genome index paths in config.sh
  • No peaks called: Check FRiP scores and library complexity

See TROUBLESHOOTING.md for detailed solutions.

πŸ“– Documentation

πŸ“š Citation

If you use this pipeline in your research, please cite:

@software{atac_seq_pipeline,
  author = {M.Adeel},
  title = {ATAC-seq Analysis Pipeline},
  year = {2026},
  url = {https://github.com/Adeel3Dgenomics/ATAC-seq-pipeline},
  version = {1.0.0}
}

And the tools used by this pipeline:

  • ATAC-seq method: Buenrostro et al. (2013) Nature Methods
  • Genrich: GitHub
  • HOMER: Heinz et al. (2010) Molecular Cell
  • See CITATIONS.md for complete list

🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ’¬ Support

πŸ™ Acknowledgments

This pipeline was developed at OMRF. We thank the developers of all the tools integrated into this pipeline.


Keywords: ATAC-seq, chromatin accessibility, epigenomics, NGS, bioinformatics, peak calling, genomics

About

Complete ATAC-seq analysis pipeline for beginners. From FASTQ to publication-ready peaks with comprehensive documentation.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors