-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clair3 cannot produce gvcf file (?) #88
Comments
Hi, It should be a disk space issue, as parallel cannot write logs |
Yes it was 100% disk space issue. My latest two samples (standalone samples) weren't incur this error. I believe this is due to too many samples deploying on the queue at the same time. I'm closing the ticket for now, Many thanks, Tuan |
Sorry for re-opening the issue again, but after fixing the disk drive issue, this pops up in the error log. Any suggestion what is this about ? Cheers, Tuan
|
Hi, it's the first time we received a report on |
Hi Ruibang, Thanks for your reply, I currently run Clair with 24 CPUs & 64 Gb of RAM, but I'm able to increase RAM if need be. I'm rerunning the test samples again, now with a slightly modified pipeline. As mentioned previously, it seems that dumping multiple runs into the same disk drive location caused some issues at our institute server (the original issue). I now direct the output from Clair3 in separate Happy to clarify should any of the above doesn't make sense, Tuan |
The size of intermediate files for GVCF output is not small and dependent on depth and sequencing quality, so hosting the intermediate files in a larger space makes sense. Thanks and keep us updated. |
Hi @aquaskyline, Some interesting stats I gather from my last run, where I give clair3 48 cores, and 600.00 GB RAM. 1st run - Normal run without
2nd run - Run with
I chase out the log file, see below, everything looks OK step 1->6, the
My guess is that something blow up the mem in the parallel merging of gvcf files. I'm trying 2 things at the moment, unsure if any of these work to be honest... 1 - modifying --chunk_size=1000000 , I personally think this ain't going to work, but my naive assumption is that smaller chunk size means smaller gvcf files ? My BAM file is ~ 35 GB by the way, if you think of any probable solution for this, please let me know and I will try it out on my system. We have nodes with 1+ TB RAM also, but in limited number and I don't want to go through that route as it would take ages to submit to these nodes. Many thanks, Tuan Nguyen |
Clair3 is not supposed to use up that much memory, either a bug in Clair3 or a glitch in the input is possible. Could you archive the |
Addressed with r11-Minor 1 |
Hello Team, I am seeing a runtime of more than 50% when running a sample with gVCF mode turned on. I don't see the official release of r11-minor. Could you please push this version on git ? |
All installation options and code are with r11 minor fixes included. And yes, 50% additional time for gVCF output is somewhat expected. |
Hi Clair3 team,
I recently need to use Clair3 gVCF instead of normal VCF. In our system, we have population-scale dataset from hundreds of ONT samples. I recalled it ran OK back then before i insert the flag
--gvcf
Wonder if disk space is an issue, similar to that highlighted in #48.
Happy to send over the log files if need be.
Best,
Tuan
The text was updated successfully, but these errors were encountered: