Identity-by-Descent (IBD) detection from pangenome assemblies using haplotype-level identity analysis.
A suite of Rust CLI tools for detecting IBD segments from whole-genome assemblies:
- IBS Detection: Sliding window identity-by-state computation from pangenome alignments
- IBD Inference: Hidden Markov Model (Viterbi + forward-backward) to distinguish true IBD from background IBS
- Jacquard Coefficients: Delta coefficient estimation for relatedness analysis
| Tool | Description | Documentation |
|---|---|---|
| ibs-cli | Window-based IBS detection | README |
| ibd-cli | HMM-based IBD inference | README |
| jacquard-cli | Jacquard delta coefficients | README |
- IBS Detection - Window-based identity analysis
- IBD Inference - HMM-based segment detection
- Jacquard Coefficients - Relatedness estimation
- Haplotype Relatedness - Determine which reference haplotype each segment matches
- Full Pipeline - End-to-end workflow
- Rust 1.70+ (rustup.rs)
# Clone repository
git clone https://github.com/MarsicoFL/HPRCv2-IBD.git
cd HPRCv2-IBD
# Build all tools (workspace build)
cargo build --releaseBinaries will be in target/release/ (ibs, ibd, jacquard).
Compute pairwise identity in sliding windows:
ibs \
--sequence-files assemblies.agc \
-a alignments.paf.gz \
--subset-sequence-list samples.txt \
--region chr1:1-10000000 \
--size 5000 \
-c 0.999 \
-m cosine \
--output ibs_results.tsvParameters:
--sequence-files: AGC archive with assemblies-a: PAF alignments to reference--subset-sequence-list: File with haplotype IDs (one per line)--region: Genomic region (chr:start-end)--size: Window size in bp-c: Identity cutoff threshold-m: Similarity metric (cosine, jaccard)
Infer IBD segments using HMM (Viterbi + forward-backward):
ibd \
--sequence-files assemblies.agc \
-a alignments.paf.gz \
-r CHM13 \
--region chr1:1-10000000 \
--size 5000 \
--subset-sequence-list samples.txt \
--population EUR \
--output ibd_segments.tsv \
--posterior-threshold 0.8Parameters:
--population: Population for HMM calibration (AFR, EUR, EAS, CSA, AMR, InterPop, Generic)--posterior-threshold: Minimum mean P(IBD) for segment (uses forward-backward)--output-posteriors: Optional file for per-window P(IBD) values
Output: TSV with segments including coordinates, identity, and posterior statistics.
Compute Jacquard delta coefficients:
jacquard \
--ibs ibs_results.tsv \
--hap-a1 HG00097#1 \
--hap-a2 HG00097#2 \
--hap-b1 HG00099#1 \
--hap-b2 HG00099#2 \
--output coefficients.jsonThe tools require:
- Assemblies: AGC-compressed genome assemblies
- Alignments: PAF alignments to a reference genome
- Sample list: Text file with haplotype identifiers
Population sample lists are included in data/samples/:
| Population | Individuals | Haplotypes |
|---|---|---|
| AFR | 67 | 134 |
| EUR | 30 | 60 |
| EAS | 50 | 100 |
| CSA | 36 | 72 |
| AMR | 44 | 88 |
| File | Size | Download |
|---|---|---|
| HPRC_r2_assemblies_0.6.1.agc | 3.1 GB | Link |
| hprc465vschm13.aln.paf.gz | 5.3 GB | Link |
Optional: Create IMPG index with impg index hprc465vschm13.aln.paf.gz
MIT License
If using these tools, please cite this repository.