A next-generation gravitational-wave analysis system that detects, decomposes, and characterizes overlapping signals in real-time using neural posterior estimation and adaptive signal subtraction.
PosteriFlow is a cutting-edge machine learning pipeline for gravitational-wave astronomy that solves a critical problem: how to extract multiple overlapping signals from noisy gravitational-wave detector data.
Modern gravitational-wave detectors (LIGO, Virgo) detect weak signals buried in noise. When multiple sources merge simultaneously, their signals overlap, creating a complex mixture that traditional methods cannot easily separate. PosteriFlow uses hierarchical neural networks to:
- Prioritize signals - Determine which sources to extract first
- Estimate parameters - Rapidly infer masses, distances, spins using neural inference
- Subtract adaptively - Remove extracted signals while preserving fainter ones
- Quantify uncertainty - Provide calibrated confidence intervals for all estimates
- Multi-messenger astronomy: Early warnings for neutron star mergers enable electromagnetic follow-up
- Population statistics: Extracting overlapping events improves population constraints on compact object formation
- Real-time decision-making: LIGO alert system can trigger faster with overlapping signals disentangled
- Scientific discovery: Overlaps may reveal unexpected binary characteristics (precession, eccentricity)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAW GRAVITATIONAL-WAVE DATA (H1, L1, V1) β
β Detector noise + overlapping GW signals + glitches β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β PHASE 1: NEURAL POSTERIOR β
β ESTIMATION (Neural PE) β
β βββββββββββββββββββββββββββββ β
β β’ Likelihood-free inference β
β β’ Multi-detector coherence β
β β’ Uncertainty quantification β
ββββββββββββββββ¬ββββββββββββββββββββ
β
Parameter estimates + uncertainties
(mass_1, mass_2, distance, sky position, spins)
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β PHASE 2: PRIORITY NET β
β Signal Ranking & Selection β
β βββββββββββββββββββββββββββββ β
β β’ Temporal encoding (CNN+BiLSTM)β
β β’ Cross-signal feature analysis β
β β’ Uncertainty-aware ranking β
β β’ Predicts extraction order β
ββββββββββββββββ¬ββββββββββββββββββββ
β
Ordered list of signals
(which to remove first)
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β PHASE 3: ADAPTIVE SUBTRACTOR β
β Iterative Signal Removal β
β βββββββββββββββββββββββββββββ β
β β’ Uncertainty-weighted subtraction
β β’ Cross-detector coherence β
β β’ Bias correction β
β β’ Residual quality monitoring β
ββββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β EXTRACTED SIGNALS & RESIDUAL NOISE β
β β’ Individual source parameters β
β β’ Parameter uncertainties β
β β’ Signal-to-noise metrics β
β β’ Residual quality assessment β
ββββββββββββββββββββββββββββββββββββββββββββ
- Likelihood-free inference using normalizing flows
- Simultaneous estimation of ~15 binary parameters
- Fast inference: <100ms for 4-second segment
- Uncertainty quantification via posterior ensemble
- Handles contamination via data augmentation
- Temporal CNN encoder: Multi-scale time-frequency features
- BiLSTM encoder: Temporal dependencies in strain data
- Cross-signal analyzer: Quantifies signal overlap and interaction
- Output: Ranking of signals + confidence in order
- Enables optimal extraction strategy
- Uses Neural PE uncertainties to weight residuals
- Subtracts strongest signal first (per PriorityNet)
- Bias correction: Accounts for parameter estimation errors
- Iterative: Updates estimates after each subtraction
- Quality monitoring: Validates residual Gaussianity
PosteriFlow generates realistic synthetic gravitational-wave data for training:
REAL LIGO/VIRGO CHARACTERISTICS
ββ Detector network (H1, L1, V1)
ββ Realistic PSDs from O4 sensitivity
ββ Real glitches & contamination
ββ Physics-accurate waveforms (IMRPhenomXAS)
ββ Realistic source populations
βΌ
PARAMETERS SAMPLED (Physics-Constrained)
ββ Masses (BBH: 5-100 Mβ, BNS: 1-2.5 Mβ)
ββ Spins (aligned & precessing)
ββ Distance (~log-uniform, Malmquist bias)
ββ Sky position (uniform on sphere)
ββ Binary merger epoch
βΌ
SIGNAL GENERATION
ββ GW waveform synthesis (PyCBC)
ββ Detector response (antenna patterns)
ββ SNR-dependent distance scaling
ββ Parameter-distance correlation (physics-validated)
βΌ
CONTAMINATION INJECTION
ββ Real LIGO noise (GWOSC, 10-25Γ speedup via caching)
ββ Neural synthetic noise (10,000Γ faster than GWOSC)
ββ Line glitches (60 Hz, harmonics)
ββ Transient glitches (blips, scattered light)
ββ PSD drift (multiple epochs)
ββ Detector dropout scenarios
βΌ
OVERLAP CREATION (45% realistic rate)
ββ 2-signal overlaps (direct mergers)
ββ Multi-signal overlaps (up to 8 signals)
ββ Partial overlaps (different durations)
ββ Subtle ranking (important for prioritization)
βΌ
EDGE CASE SAMPLING (8% of dataset)
ββ Physical extremes (high mass-ratio, spins)
ββ Observational extremes (strong glitches)
ββ Statistical extremes (multimodal posteriors)
ββ Overlapping extremes (subtle ranking)
βΌ
FINAL DATASET (25,000+ samples)
ββ Detector strain (H1, L1, V1) + preprocessing
ββ Ground-truth parameters
ββ Network SNR & quality metrics
ββ Metadata for analysis
ββ Train/val/test splits (80/10/10)
SIGNAL TYPE DISTRIBUTION:
ββ Binary Black Hole (BBH): 46% β Loudest, most common
ββ Binary Neutron Star (BNS): 32% β Rare, long duration, crucial for EW
ββ NS-BH (NSBH): 17% β Intermediate
ββ Noise only: 5% β Background characterization
OVERLAP STATISTICS:
ββ Single signals: 55% of samples
ββ Overlapping: 45% of samples
β ββ 2-3 signals: 35%
β ββ 4-5 signals: 8%
β ββ 6+ signals: 2%
ββ Average: 2.25 signals per sample
SNR DISTRIBUTION (O4 REALISTIC):
ββ Weak (10-15): 5%
ββ Low (15-25): 35% β Most detections
ββ Medium (25-40): 45%
ββ High (40-60): 12%
ββ Loud (60-80): 3%
PARAMETER RANGES:
ββ Masses: 3-200 Mβ (detector frame)
ββ Distances: 10-18,000 Mpc
ββ Spins: 0-0.99
ββ SNR: 3-100
Real Noise Integration (10-25Γ speedup)
- Pre-downloaded GWOSC segments (133 cached files)
- Three-level fallback: cache β on-demand β synthetic
- 10% real noise mixing for enhanced realism
Neural Noise Generation (10,000Γ speedup)
- FMPE pre-trained models (Gaussian_network.pickle)
- Colored Gaussian & non-Gaussian variants
- Falls back gracefully if models unavailable
TransformerStrainEncoder Enhancement
- State-of-the-art strain encoding
- Attention-based temporal modeling
- Outperforms CNN+BiLSTM baselines
# Clone repository
git clone https://github.com/bibinthomas123/PosteriFlow.git
cd PosteriFlow
# Initialize conda (first time only)
conda init
# Activate environment
conda activate ahsd
# Install package in development mode
pip install -e . --no-depsImportant: The conda environment ahsd exists and contains all dependencies. Never recreate it.
# Generate 25,000 samples (default, ~1.5-2 hours)
python src/ahsd/data/scripts/generate_dataset.py \
--config configs/data_config.yaml \
--num-samples 25000
# Custom parameters
python src/ahsd/data/scripts/generate_dataset.py \
--config configs/data_config.yaml \
--num-samples 50000 \
--output-dir data/dataset_custom# Train neural parameter estimation network
python experiments/phase3a_neural_pe.py \
--config configs/enhanced_training.yaml \
--batch-size 32 \
--epochs 100
# Monitor training
tensorboard --logdir outputs/# Train signal prioritization network
python experiments/train_priority_net.py \
--config configs/priority_net.yaml \
--create-overlaps \
--batch-size 16
# Resume from checkpoint
python experiments/train_priority_net.py \
--resume outputs/prioritynet_checkpoint.pth \
--create-overlaps# Full validation suite
python experiments/phase3c_validation.py \
--phase3a_output outputs/phase3a_output_X/ \
--phase3b_output outputs/phase3b_production/ \
--n_samples 2000 \
--seeds 5
# Expected output:
# β
System Success Rate: 82.1%
# β
Neural PE Accuracy: 0.582 Β± 0.087
# β
Subtraction Efficiency: 81.1%| Metric | Value | Notes |
|---|---|---|
| System Success Rate | 82.1% | End-to-end detection of all signals |
| Average Efficiency (Ξ·) | 81.1% | Residual energy reduction |
| Latency per 4s segment | 156 ms | Dual-channel (H1, L1) |
| Throughput | 25.6 seg/s | Real-time capable |
| Memory (8GB VRAM) | Fits | Batch inference supported |
| Dataset | APE (mean) | APE (std) | Comments |
|---|---|---|---|
| Clean (training) | 0.802 | 0.012 | Physics-perfect data |
| Contaminated (validation) | 0.582 | 0.087 | Realistic noise |
| After subtraction | 0.645 | 0.074 | Improved residuals |
| Metric | Value | Target |
|---|---|---|
| Top-K Precision@1 | 96.6% | >95% |
| Ranking Correlation | 0.605 | >0.50 |
| Priority Accuracy | 94.6% | >90% |
| Calibration Error | <0.05 | <0.10 |
METRIC STABILITY ACROSS 5 SEEDS (200 samples each):
ββ Neural PE Accuracy: 0.582 Β± 0.004 (variation: 0.1%)
ββ Subtraction Ξ·: 0.811 Β± 0.001 (variation: <0.1%)
ββ System Success: 0.821 Β± 0.008 (variation: 1.0%)
ββ Statistical significance: Cohen's d > 2.0
PosteriFlow/
βββ π src/ahsd/ # Main package
β βββ π core/ # Core algorithms
β β βββ priority_net.py # Signal prioritization (PriorityNet)
β β βββ adaptive_subtractor.py # Adaptive subtraction + NeuralPE
β β βββ ahsd_pipeline.py # Full end-to-end pipeline
β β βββ bias_corrector.py # Parameter bias correction
β βββ π data/ # Data generation & preprocessing
β β βββ dataset_generator.py # Main dataset generator
β β βββ waveform_generator.py # GW waveform synthesis (PyCBC)
β β βββ noise_generator.py # Synthetic noise + glitches
β β βββ neural_noise_generator.py # FMPE neural noise (10kΓ speedup)
β β βββ parameter_sampler.py # Physics-constrained sampling
β β βββ psd_manager.py # Power spectral density management
β β βββ gwtc_loader.py # Real GWOSC data loading
β β βββ injection.py # Signal injection into noise
β β βββ preprocessing.py # Whitening, normalization
β β βββ config.py # Config loading & validation
β βββ π models/ # Neural network architectures
β β βββ neural_pe.py # Neural PE normalizing flow
β β βββ overlap_neuralpe.py # Multi-signal PE variant
β β βββ transformer_encoder.py # TransformerStrainEncoder
β β βββ flows.py # Flow architectures
β β βββ rl_controller.py # RL-based control (future)
β βββ π evaluation/ # Metrics & analysis
β β βββ metrics.py # APE, efficiency, ranking metrics
β βββ π utils/ # Utilities
β βββ config.py # Configuration classes
β βββ logging.py # Logging setup
β βββ data_format.py # Data standardization
βββ π experiments/ # Training & evaluation scripts
β βββ phase3a_neural_pe.py # Neural PE training
β βββ train_priority_net.py # PriorityNet training
β βββ data_generation.py # Dataset generation wrapper
β βββ phase3c_validation.py # Multi-seed validation
βββ π configs/ # Configuration files (YAML)
β βββ data_config.yaml # Data generation parameters
β βββ enhanced_training.yaml # Training hyperparameters
β βββ priority_net.yaml # PriorityNet config
β βββ inference.yaml # Inference settings
βββ π tests/ # Unit & integration tests
β βββ test_dataset_generation.py
β βββ test_neural_pe.py
β βββ test_priority_net.py
β βββ test_integration.py
βββ π models/ # Trained model checkpoints
β βββ neural_pe_best.pth
β βββ prioritynet_checkpoint.pth
βββ π data/ # Generated datasets
β βββ dataset/
β β βββ train.pkl
β β βββ val.pkl
β β βββ test.pkl
β βββ Gaussian_network.pickle # FMPE model (neural noise)
βββ π outputs/ # Experiment results
β βββ phase3a_output_X/
β βββ phase3b_production/
β βββ logs/
βββ π gw_segments/ # Pre-cached GWOSC segments
β βββ [133 real noise segments]
βββ π notebooks/ # Analysis & visualization
βββ π docs/ # Additional documentation
βββ pyproject.toml # Package metadata & dependencies
βββ setup.py # Package setup
βββ AGENTS.md # Development guidelines
βββ README.md # This file
All parameters are controlled via YAML configuration files in configs/:
# Core parameters
n_samples: 25000 # Number of samples to generate
sample_rate: 4096 # Hz (LIGO standard)
duration: 4.0 # seconds
detectors: [H1, L1, V1] # Detector network
# Signal characteristics
overlap_fraction: 0.45 # Realistic O4 rate
edge_case_fraction: 0.08 # Physical/statistical extremes
create_overlaps: true # Enable multi-signal generation
# Contamination
add_glitches: true
neural_noise_enabled: true # 10,000Γ speedup
neural_noise_prob: 0.5 # 50% neural, 50% synthetic
use_real_noise_prob: 0.1 # 10% real GWOSC (cached)
# Event distribution (realistic O4)
event_type_distribution:
BBH: 0.46 # Most common
BNS: 0.32 # Rare but important
NSBH: 0.17 # Intermediate
noise: 0.05 # Background# Hyperparameters
learning_rate: 0.0005
batch_size: 32
epochs: 100
weight_decay: 1e-5
# Loss weights
loss_weights:
mse: 0.35 # Parameter estimation
ranking: 0.50 # Ranking loss
uncertainty: 0.15 # Calibration
# Data augmentation
augment_contamination: true
noise_augmentation_k: 1.0
preprocess: true# Architecture
temporal_encoder_dim: 128
hidden_dim: 256
num_heads: 8 # Multi-head attention
# Training
learning_rate: 0.0002
batch_size: 16
epochs: 80
create_overlaps: true # Enable multi-signal trainingRun the comprehensive test suite:
# All tests
pytest
# Specific test
pytest tests/test_priority_net.py::TestPriorityNet::test_forward_pass -v
# With coverage
pytest --cov=ahsd --cov-report=html
# Verbose with print statements
pytest -v -s
# Specific test file
pytest tests/test_neural_pe.py| Test | Purpose | Location |
|---|---|---|
| Neural PE | Forward pass, loss computation | tests/test_neural_pe.py |
| PriorityNet | Signal ranking, feature extraction | tests/test_priority_net.py |
| Dataset | Data generation, splits, validation | tests/test_dataset_generation.py |
| Integration | End-to-end pipeline | tests/test_integration.py |
- Prepare real GW data in HDF5 format
- Implement data reader in
src/ahsd/data/gwtc_loader.py - Update
data_config.yamlwith real data paths - Run training pipeline
from ahsd.core.adaptive_subtractor import NeuralPE
import numpy as np
# Load strain data
strain_data = {
'H1': np.load('H1_data.npy'),
'L1': np.load('L1_data.npy'),
'V1': np.load('V1_data.npy'),
}
# Quick estimation
pe = NeuralPE()
result = pe.quick_estimate(strain_data)
print(f"Mass 1: {result['mass_1_mean']:.1f} Mβ")
print(f"Distance: {result['luminosity_distance_mean']:.0f} Mpc")
print(f"SNR: {result['network_snr']:.1f}")from ahsd.core.ahsd_pipeline import AHSDPipeline
# Initialize pipeline
pipeline = AHSDPipeline(
neural_pe_model='models/neural_pe_best.pth',
priority_net_model='models/prioritynet_best.pth',
subtractor_model='models/subtractor_best.pth',
)
# Process 4-second segment
result = pipeline.run(strain_data={
'H1': h1_strain,
'L1': l1_strain,
'V1': v1_strain,
})
# Extracted signals
for i, signal in enumerate(result['extracted_signals']):
print(f"\nSignal {i+1}:")
print(f" Mass 1: {signal['mass_1']:.1f} Mβ")
print(f" SNR: {signal['snr']:.1f}")
print(f" Confidence: {signal['priority_score']:.2f}")Approach: Likelihood-free inference using normalizing flows
- Input: Multi-detector strain (whitened, windowed)
- Output: Posterior samples of ~15 astrophysical parameters
- Speed: <100ms per 4s segment
- Training: On clean synthetic waveforms + augmented contamination
Key Features:
- Amortized inference: Single network for all parameters
- Uncertainty quantification: Full posterior ensemble
- Multi-detector coherence: Combines H1, L1, V1 optimally
- Robust to PSD variation: Data augmentation during training
Approach: Deep learning on temporal strain features
- Architecture: CNN (multi-scale) + BiLSTM (temporal) + Attention (context)
- Input: Whitened strain for multiple signals
- Output: Ranking order (which signal to subtract first)
- Training: On overlapping synthetic signals
Why Prioritization Matters:
- Extracting loud signal first reduces noise floor
- Removes contamination bias on faint signals
- Improves overall parameter estimation accuracy
- Handles multimodal posteriors better
Approach: Iterative removal with uncertainty weighting
- Step 1: Identify signal with highest priority
- Step 2: Subtract using Neural PE parameters + uncertainties
- Step 3: Bias correction: Account for parameter errors
- Step 4: Validate residual Gaussianity
- Step 5: Repeat for remaining signals
Uncertainty Weighting:
- Larger uncertainties β weaker subtraction (preserve signal)
- Calibrated uncertainties β correct bias
- Cross-detector coherence check
-
PyCBC Waveforms: arXiv:1508.01844
- GW waveform generation and detection
-
LIGO Data Conditioning: arXiv:2002.01606
- Real gravitational-wave detector noise
-
Normalizing Flows: arXiv:1810.01367
- Flexible density estimation (used in Neural PE)
-
DINGO: arXiv:2105.12151
- Deep inference for GW observations (basis for neural noise models)
- GWOSC: gwosc.readthedocs.io
- Public gravitational-wave detector data
- GWTC-3: arXiv:2105.15615
- LIGO-Virgo third catalogs of GW transients
- Create feature branch:
git checkout -b feature/description - Code style: Follow AGENTS.md guidelines
- Test: Run
pytestbefore committing - Format:
black . && isort . && flake8 . - Commit message: Descriptive, explain "why"
- Push & PR: Create pull request with summary
- Type hints: Always (required for all functions)
- Docstrings: NumPy format for classes and methods
- Line length: 100 characters (black formatter)
- Testing: Unit tests for new modules
- Coverage: Aim for >80% for new code
- Docs - Use this folder to understand the core functionality and how to run the code
# Data generation
ahsd-generate --config configs/data_config.yaml
# Validation
ahsd-validate --dataset data/dataset/train.pkl
# Analysis
ahsd-analyze --input-data data.hdf5 --output results.pkl
# Model training
python experiments/phase3a_neural_pe.py --config configs/enhanced_training.yaml
# Validation
python experiments/phase3c_validation.py --phase3a_output outputs/phase3a_output_X/ \
--phase3b_output outputs/phase3b_production/ --n_samples 2000 --seeds 5MIT License - see LICENSE for details
Author: Bibin Thomas
Email: bibinthomas951@gmail.com
Repository: https://github.com/bibinthomas123/PosteriFlow
If you use PosteriFlow in your research, please cite:
@software{thomas2025posteriflow,
title={PosteriFlow: Adaptive Hierarchical Signal Decomposition
for Overlapping Gravitational Waves},
author={Thomas, Bibin},
year={2025},
url={https://github.com/bibinthomas123/PosteriFlow}
}PosteriFlow builds on foundational work from:
- LIGO-Virgo Collaboration for detector design and data access
- PyCBC for waveform generation
- Bilby for Bayesian inference tools
- GWpy for detector data handling
- DINGO for neural density estimation techniques
Built for the next generation of gravitational-wave astronomy π
Last Updated: November 12, 2025