Skip to content

Question about using pre-computed CheckM results with dRep #287

@ZhaosuSong

Description

@ZhaosuSong

Dear Dr. Olm,
I am working with a large dataset (~7,000 genomes) and encountered failures when dRep calls CheckM internally. For example, when I run:
dRep compare output_directory -g path/to/genomes/*.fasta
the workflow completes successfully with 6 genomes.
However, with ~7,200 genomes, it fails during the CheckM step:
Will filter the genome list
7,233 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
Running prodigal
Running checkM
Running checkM in 4 chunks
Running checkM chunk 0
!!! checkM failed !!!

To address this, I tried running CheckM separately with the following command:
checkm lineage_wf -t 32 -x fasta genomeB/ checkm_out1/
This produced the file checkm_out1/qa_results.tsv.

Then, I attempted to run dRep using these pre-computed results:
dRep dereplicate drep_out_95_99
-g "genomeB/*.fasta"
-p 32
--P_ani 0.95
--S_ani 0.99
--S_algorithm fastANI
--genomeInfo checkm_out1/qa_results.tsv
My questions are:

1、Is this the correct way to provide pre-computed CheckM results to dRep?
2、If not, could you clarify the proper format of the --genomeInfo file expected by dRep?
3、Additionally, is there any recommended way to resolve the internal CheckM failure at the step:
Running prodigal
Running checkM
Running checkM in 4 chunks
Running checkM chunk 0
!!! checkM failed !!!

Thank you very much for your time and help.
Best regards,
zss

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions