Skip to content

Noble-Lab/2025_gmurtaza_loop-qc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTCF Chromatin Loop Analysis Pipeline

A Python-based pipeline for analyzing chromatin loops from BEDPE files, identifying CTCF binding motifs at loop anchors, and classifying loops by their orientation patterns.

Overview

This pipeline processes chromatin loop data to:

  1. Extract and analyze CTCF binding motifs at loop anchor regions using FIMO motif scanning
  2. Classify loops into 7 categories based on CTCF anchor presence and orientation patterns
  3. Analyze loop length distributions with statistical significance testing across categories
  4. Generate publication-quality visualizations comparing loop metrics across classifications

Quick Start

Setup

# Create environment
mamba env create -f environment.yml
mamba activate loops-qc

Run Analysis

# Execute pipeline
python src/main.py configs/my_analysis.yaml

Analysis Features

Loop Classification

Loops are classified into 7 categories based on CTCF anchor presence and orientation:

Category Description CTCF Orientation
Zero anchors No CTCF binding motifs -
One anchor CTCF at only one anchor -
All two-anchor All loops with 2 anchors -
Convergent CTCF facing each other F-R
Tandem forward Both CTCF forward F-F
Tandem reverse Both CTCF reverse R-R
Divergent CTCF facing away R-F

Statistical Analysis

  • Loop length distributions compared across all categories
  • Significance testing (t-tests) with markers:
    • * = p < 0.05
    • ** = p < 0.01
    • *** = p < 0.001

APA Analysis

When a Hi-C cooler file is provided:

  • Aggregate Peak Analysis aggregates Hi-C contact frequencies around loop anchor regions for each category
  • Symmetrization corrects for Hi-C matrix storage format bias
  • Comparison heatmaps visualize loop strength and interaction patterns by category
  • Enrichment statistics quantify contact enrichment at anchor regions

Output Files

Analysis produces organized outputs:

Figures

  • loop_length_violin_by_category.png - Violin plot comparing loop length distributions across all 7 categories with statistical significance markers
  • apa_comparison_heatmaps.png - APA (Aggregate Peak Analysis) heatmaps showing Hi-C contact enrichment around loop anchors for each category (generated if Hi-C file is provided)

Data Files

  • generated_files/ - 7 category-specific BEDPE files + statistics CSVs
  • temp/ - Intermediate files for debugging (BED, FASTA, FIMO results)
  • apa_results/ - Per-category APA matrices and statistics (if APA analysis enabled)

Sample Configuration

# Input BEDPE file
input:
  bedpe: "/net/noble/vol4/user/gmurtaza/doulatov_hic_data/shared/hicfoundation_outputs/Bcell/10000/low_res_lc/HiCFoundation_loop_0.1.bedpe"
  has_header: true  # Set to false if BEDPE file has no header row

# Output configuration
output:
  directory: "outputs/Bcell_res:10000_threshold:0.1/"

# Loop filtering options
loops:
  include_interchromosomal: false  # false = intra-chromosomal only, true = all loops

# Motif analysis (optional - omit or set to null to skip)
motif_analysis:
  fasta: /net/noble/vol4/user/gmurtaza/doulatov_hic_data/4DN_supporting_files/GRCh38_no_alt_analysis_set_GCA_000001405.15.fa  # Path to reference genome FASTA file
  motif: /net/noble/vol1/home/gmurtaza/Documents/2025_gmurtaza_loops-qc/motifs/MA0139.2.meme # Path to motif file in MEME format
  fdr_threshold: 0.25
  fimo_threshold: "1e-5"
  optimize_convergent: false

# APA analysis (optional - omit or set to null to skip)
apa_analysis:
  hic_file: /net/noble/vol4/user/gmurtaza/doulatov_hic_data/processed/Bcell/Bcell.multires.cool  # Path to .cool or .h5 Hi-C contact matrix file
  window_size: 25000  # Window size in bp around anchors (±25kb)
  resolution: 10000  # Resolution of Hi-C matrix in bp
  nproc: 8  # Number of threads for pileup computation
  min_loop_length: 200000  # Minimum loop length in bp for APA analysis (e.g., 10000 for 10kb). Set to null to disable

About

All the code to gather QC metrics for chromatin loop data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages