-
Notifications
You must be signed in to change notification settings - Fork 123
Description
Hello, and thank you very much for developing Minigraph-Cactus and for providing such helpful documentation. I am relatively new to bioinformatics, but the examples and instructions in this repository have been extremely valuable. Using them, I successfully constructed a pangenome graph from 8 fungal genomes (including one reference). In this run, I used the --noSplit option so that translocations would be preserved instead of being resolved into local alignments.
My current goal is to characterize structural variants (SVs) from the graph, specifically:
- Large insertions and deletions (INDELs)
- Inversions (INV)
- Translocations (TRA)
- Duplications (DUP)
So far, I was able to extract INDELs reliably by using vg deconstruct followed by normalization and size-based filtering.
However, I am unsure about the best-practice approach for obtaining accurate counts of inversions, translocations, and duplications. I attempted to infer these from the PAF alignments produced during the Minigraph-Cactus run, using custom awk parsing of orientation changes and contig transitions. While this gives reasonable estimates, I am not confident that this approach correctly distinguishes true structural events from repetitive or multi-mapped regions. In particular, duplications appear highly inflated, suggesting over-counting from repetitive alignments.
My question:
Is there a recommended or standard workflow to extract INV / TRA / DUP events directly from the outputs of Minigraph-Cactus?
Thank you again for your time and for maintaining this powerful and impactful tool.
Regards,