Modify position of insertions #73

bnoyvert · 2022-02-14T13:20:20Z

Modify position of insertions - take it from the same read as the insertion allele is taken from.

I propose to change the way cuteSV calls insertions. I understand it records the average position of inserted sequences in the reads as the variant position, and the variant allele is the inserted sequence in one of the reads satisfying certain criteria. So the inserted position and the inserted sequence are not consistent, resulting in the variant haplotype to be wrong. I suggest a small change - to take the position and the inserted allele from the same read.

The attached example contains a 51 base heterozygous insertion in a tandem repeat, the inserted sequences in reads are scattered around in the 1:240699860-240700080 region.

cuteSV calls the insertion as following:
1 240699952 cuteSV.INS.0 A AGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTA
where the position 240699952 is the average inserted position, and the 51 base variant allele is taken from one of the reads that have an insertion at 1:240699867. As a result the reconstructed variant haplotype is quite different from the consensus of the reads carrying the insertion:

P1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50

P1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100

P1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150

P1               151 GA-----------------GAGTTTAAGACAAAATGGCAGCGGGGGCTGT    183
                     ||                 |||||||||||||||||||||||||||||||
S1               151 GATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGT    200

P1               184 AGATCTGGAAGCCATCTGTATCTGGAAGCCATCTGTAGAGTTTAAGACAA    233
                     ||||||||||||||||||||                 |||||||||||||
S1               201 AGATCTGGAAGCCATCTGTA-----------------GAGTTTAAGACAA    233

P1               234 AATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTAGAGTTTAAGACA    283
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               234 AATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTAGAGTTTAAGACA    283

P1               284 AAATGGCAGTGGGGGCTGTAGATCTGGAAGCCATCGGTAGCTACGTGCAT    333
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               284 AAATGGCAGTGGGGGCTGTAGATCTGGAAGCCATCGGTAGCTACGTGCAT    333

P1               334 GGGGATAAGGTCTCTGATGCATAATGTGAGATTTAAAAGAGGCAAATGTT    383
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               334 GGGGATAAGGTCTCTGATGCATAATGTGAGATTTAAAAGAGGCAAATGTT    383

P1               384 GGATCTTGAAGAAAACT    400
                     ||||||||||||||||| 
S1               384 GGATCTTGAAGAAAACT    400

where P1 is the variant haplotype, S1 is the read consensus.

The modified version of cuteSV calls the variant as following:
1 240699867 cuteSV.INS.0 G GGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAGATCTGGAAGCCATCTGTA
Here the position and the variant allele are extracted from the same read. The corresponding haplotype aligns perfectly with the consensus:

P1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                 1 GTCCAAGAGGGGGATTGGTAATTGTTCCTCGGATGGGGAAACATGACATG     50

P1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1                51 TGAAAAATATGGGTCAGGAGTTTAAGACAAAATGGCAGCGGGGGCTGTAG    100

P1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               101 ATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGTA    150

P1               151 GATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGT    200
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               151 GATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTGT    200

P1               201 AGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTG    250
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               201 AGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGCGGGGGCTG    250

P1               251 TAGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGTGGGGGCT    300
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               251 TAGATCTGGAAGCCATCTGTAGAGTTTAAGACAAAATGGCAGTGGGGGCT    300

P1               301 GTAGATCTGGAAGCCATCGGTAGCTACGTGCATGGGGATAAGGTCTCTGA    350
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               301 GTAGATCTGGAAGCCATCGGTAGCTACGTGCATGGGGATAAGGTCTCTGA    350

P1               351 TGCATAATGTGAGATTTAAAAGAGGCAAATGTTGGATCTTGAAGAAAACT    400
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
S1               351 TGCATAATGTGAGATTTAAAAGAGGCAAATGTTGGATCTTGAAGAAAACT    400

Comparing haplotypes of a few thousands of insertions called by cuteSV to haplotypes of high confidence set of insertions one can see that the haplotypes called by the modified version of cuteSV are much closer to the real haplotypes (haplotype similarity scores are closer to one):

insertion_example.zip

…ertion allele is taken from.

Modify position of insertions - take it from the same read as the ins…

cb672f3

…ertion allele is taken from.

tjiangHIT merged commit 8a6a178 into tjiangHIT:master Feb 15, 2022

tjiangHIT mentioned this pull request Feb 15, 2022

Wrong insertion allele? #66

Closed

bnoyvert deleted the insertion_position branch February 15, 2022 17:15

waltergallegog mentioned this pull request Sep 26, 2023

[Question] How are POS, LEN and SEQ of insertions determined #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify position of insertions #73

Modify position of insertions #73

bnoyvert commented Feb 14, 2022

Modify position of insertions #73

Modify position of insertions #73

Conversation

bnoyvert commented Feb 14, 2022

Modify position of insertions - take it from the same read as the insertion allele is taken from.