Dear Dr. Olm,
I am working with a large dataset (~7,000 genomes) and encountered failures when dRep calls CheckM internally. For example, when I run:
dRep compare output_directory -g path/to/genomes/*.fasta
the workflow completes successfully with 6 genomes.
However, with ~7,200 genomes, it fails during the CheckM step:
Will filter the genome list
7,233 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
Running prodigal
Running checkM
Running checkM in 4 chunks
Running checkM chunk 0
!!! checkM failed !!!
To address this, I tried running CheckM separately with the following command:
checkm lineage_wf -t 32 -x fasta genomeB/ checkm_out1/
This produced the file checkm_out1/qa_results.tsv.
Then, I attempted to run dRep using these pre-computed results:
dRep dereplicate drep_out_95_99
-g "genomeB/*.fasta"
-p 32
--P_ani 0.95
--S_ani 0.99
--S_algorithm fastANI
--genomeInfo checkm_out1/qa_results.tsv
My questions are:
1、Is this the correct way to provide pre-computed CheckM results to dRep?
2、If not, could you clarify the proper format of the --genomeInfo file expected by dRep?
3、Additionally, is there any recommended way to resolve the internal CheckM failure at the step:
Running prodigal
Running checkM
Running checkM in 4 chunks
Running checkM chunk 0
!!! checkM failed !!!
Thank you very much for your time and help.
Best regards,
zss