Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify position of insertions #73

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

bnoyvert
Copy link
Contributor

Modify position of insertions - take it from the same read as the insertion allele is taken from.

I propose to change the way cuteSV calls insertions. I understand it records the average position of inserted sequences in the reads as the variant position, and the variant allele is the inserted sequence in one of the reads satisfying certain criteria. So the inserted position and the inserted sequence are not consistent, resulting in the variant haplotype to be wrong. I suggest a small change - to take the position and the inserted allele from the same read.

The attached example contains a 51 base heterozygous insertion in a tandem repeat, the inserted sequences in reads are scattered around in the 1:240699860-240700080 region.
igv_snapshot

cuteSV calls the insertion as following:
1 240699952 cuteSV.INS.0 A AGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTA
where the position 240699952 is the average inserted position, and the 51 base variant allele is taken from one of the reads that have an insertion at 1:240699867. As a result the reconstructed variant haplotype is quite different from the consensus of the reads carrying the insertion:

P1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50

P1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100

P1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150

P1               151 GA-----------------GAGTTTAAGACAAAATGGCAGCGGGGGCTGT    183
                     ||                 |||||||||||||||||||||||||||||||
S1               151 GATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGT    200

P1               184 AGATCTGGAAGCCATCTGTATCTGGAAGCCATCTGTAGAGTTTAAGACAA    233
                     ||||||||||||||||||||                 |||||||||||||
S1               201 AGATCTGGAAGCCATCTGTA-----------------GAGTTTAAGACAA    233

P1               234 AATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTAGAGTTTAAGACA    283
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               234 AATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTAGAGTTTAAGACA    283

P1               284 AAATGGCAGTGGGGGCTGTAGATCTGGAAGCCATCGGTAGCTACGTGCAT    333
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               284 AAATGGCAGTGGGGGCTGTAGATCTGGAAGCCATCGGTAGCTACGTGCAT    333

P1               334 GGGGATAAGGTCTCTGATGCATAATGTGAGATTTAAAAGAGGCAAATGTT    383
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               334 GGGGATAAGGTCTCTGATGCATAATGTGAGATTTAAAAGAGGCAAATGTT    383

P1               384 GGATCTTGAAGAAAACT    400
                     ||||||||||||||||| 
S1               384 GGATCTTGAAGAAAACT    400

where P1 is the variant haplotype, S1 is the read consensus.

The modified version of cuteSV calls the variant as following:
1 240699867 cuteSV.INS.0 G GGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTA
Here the position and the variant allele are extracted from the same read. The corresponding haplotype aligns perfectly with the consensus:

P1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50

P1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100

P1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150

P1               151 GATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGT    200
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               151 GATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGT    200

P1               201 AGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTG    250
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               201 AGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTG    250

P1               251 TAGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGTGGGGGCT    300
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               251 TAGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGTGGGGGCT    300

P1               301 GTAGATCTGGAAGCCATCGGTAGCTACGTGCATGGGGATAAGGTCTCTGA    350
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               301 GTAGATCTGGAAGCCATCGGTAGCTACGTGCATGGGGATAAGGTCTCTGA    350

P1               351 TGCATAATGTGAGATTTAAAAGAGGCAAATGTTGGATCTTGAAGAAAACT    400
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               351 TGCATAATGTGAGATTTAAAAGAGGCAAATGTTGGATCTTGAAGAAAACT    400

Comparing haplotypes of a few thousands of insertions called by cuteSV to haplotypes of high confidence set of insertions one can see that the haplotypes called by the modified version of cuteSV are much closer to the real haplotypes (haplotype similarity scores are closer to one):
cuteSVchange haplotype_scores

insertion_example.zip

@tjiangHIT tjiangHIT merged commit 8a6a178 into tjiangHIT:master Feb 15, 2022
@tjiangHIT tjiangHIT mentioned this pull request Feb 15, 2022
@bnoyvert bnoyvert deleted the insertion_position branch February 15, 2022 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants