Skip to content

GTDBTK_CLASSIFYWF should have an option to run in one go #1007

@sarah-shah-bioinf

Description

@sarah-shah-bioinf

GTDBTK in one go

Suggestion: I understand that the GTDBTK_CLASSIFYWF is separately submitted for each bin because that's the spirit of nextflow. However, when there are many bins, ~3000 and above, it's actually more efficient to run it in one go. This is because skani and pplacer can be performed for the whole dataset in one go, instead of repeating these steps for every individually submitted jobs. I have performed GTDB-tk standalone on a 120 G 30 core workstation for 3000 bins which completed within 24 hours; meanwhile, trying to recreate this step via this nf-core mag pipeline takes forever on a small HPC due to having to submit 3000 jobs. It would be nice to have a parameter such as a --gtdbtk_combo_run that would submit one giant job for GTDBTK_CLASSIFYWF.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions