Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeat at the SV breakpoint #127

Open
charliechen912ilovbash opened this issue Jul 3, 2023 · 4 comments
Open

repeat at the SV breakpoint #127

charliechen912ilovbash opened this issue Jul 3, 2023 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@charliechen912ilovbash
Copy link

Hi,
I'm wondering if there exist repeat sequence (e.g. simple repeat) on the SV (e.g. deletion) breakpoint, will it affect the accuracy of SV position? or how does CuteSV v1.0.12 overcome this issue.

@tjiangHIT
Copy link
Owner

Hello @charliechen912ilovbash,

Sorry for replying so late.
It is well known that the repeat sequence would disturb the alignment and report low-accurate breakpoints on the read. SV callers collect the breakpoints on each read to infer SV candidates. There is no doubt that treating the low-accurate breakpoints as SV signatures would produce low-quality SV positions. To overcome this, cuteSV clusters all breakpoint signatures in a relatively small region to generate "consensus" SV breakpoint groups, then divides them into possible SV events through length signatures. After that, report final SV calls and corresponding genotypes. For more details please read our paper here.
I hope this is helpful to you.

Best regards,
Tao

@tjiangHIT tjiangHIT added the help wanted Extra attention is needed label Sep 1, 2023
@baozg
Copy link

baozg commented Sep 5, 2023

Hi, Tao

But for the assembly-based SVs calling, did cuteSV still cluster breakpoints? Since it is only one read in the sam, could it be possible for cuteSV to report these breakpoints?

@tjiangHIT
Copy link
Owner

Hello @baozg,

Thanks for pointing this out.
Actually, cuteSV achieves assembly-based SV calling by converting the typical SV callsets to diploid-based SV callsets. That is, cuteSV generated the initial SV callsets first, which applied the cluster approach mentioned above (there is still more than one SV signature somewhere even though only one contig for a haplotype). Then cuteSV resolves the haplotype tags for each SV call to give phasing-genotype.

Tao

@baozg
Copy link

baozg commented Sep 6, 2023

Hi, Tao

But for an inbreeding plant or haploid cell lines in humans, like A.thaliana or CHM13. It only have one haplotype, did this also need a clustering step.

Besides, as you mentioned, if I want to call variations with cuteSV with population-level assemblies, it would be better to use all the assemblies in one alignment file for this clustering step to refine the breakpoints, right?

Zhigui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants