-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty output "Insufficiently many confident reads for aggregating across runs" #27
Comments
Hi, I think QUILT eats bam files and will take care of them. Normally you don't need to preprocess the bam files, especially no need for bam quality score control. The error message is due to too few reads left after your preprocessing. And increasing buffer won't help out imputing regions without any reads. Buffer size of 250000 is big enough for most regions. |
Hi, I checked the BAM file in IGV and indeed it has only one read covering that region, but you expect that from shallow sequencing data I guess. Is there minimum number of reads to impute a genotype? Thanks for your answer, your clarifications about bam files preprocessing and buffer parameter are very useful. Best, |
The messages from your first post related to how QUILT tries to do phasing Basically, it tries multiple starts (normally 7), and gets read assignements from each of them. Then on the final phasing round, it tries to get a best set from them, and proceed. That's what the "There are 5 out of 18 regions that have been flipped by consensus" message meant, the consensus process is trying to come up with a best read phasing. Similarly, "Insufficiently many confident reads for aggregating across runs", means it can't do this process, as there are too few "confident" reads (reads that map to one or the other haplotype confidently). I wouldn't consider any of these to be error messages, the program should still run, they should just be informative. I wouldn't say there's a minimum number of reads or depth to impute a sample. With some mice samples I've seen excellent results with less than 0.1X. It really depends on how related the samples are, and how long the LD blocks are. Hope that helps, |
Hi,
This is my first time using QUILT, I am trying to impute genotypes in the location of ~300 SNVs.
Preprocessing of the bam files I used included alignment to GRCh38 with BWA, marking and removing PCR duplicates with Picard and filtering out reads with MAPQ < 37 with samtools.
It works well for most of them but there's one specific location that gives me an empty output.
With buffer=1000 it throws the following message: "Insufficiently many confident reads for aggregating across runs"
While buffer=500000 continues giving me an empty output but throws the following message: "There are 5 out of 18 regions that have been flipped by consensus"
Thank you.
Best,
Maria
The text was updated successfully, but these errors were encountered: