GTDBTK in one go
Suggestion: I understand that the GTDBTK_CLASSIFYWF is separately submitted for each bin because that's the spirit of nextflow. However, when there are many bins, ~3000 and above, it's actually more efficient to run it in one go. This is because skani and pplacer can be performed for the whole dataset in one go, instead of repeating these steps for every individually submitted jobs. I have performed GTDB-tk standalone on a 120 G 30 core workstation for 3000 bins which completed within 24 hours; meanwhile, trying to recreate this step via this nf-core mag pipeline takes forever on a small HPC due to having to submit 3000 jobs. It would be nice to have a parameter such as a --gtdbtk_combo_run that would submit one giant job for GTDBTK_CLASSIFYWF.
GTDBTK in one go
Suggestion: I understand that the GTDBTK_CLASSIFYWF is separately submitted for each bin because that's the spirit of nextflow. However, when there are many bins, ~3000 and above, it's actually more efficient to run it in one go. This is because skani and pplacer can be performed for the whole dataset in one go, instead of repeating these steps for every individually submitted jobs. I have performed GTDB-tk standalone on a 120 G 30 core workstation for 3000 bins which completed within 24 hours; meanwhile, trying to recreate this step via this nf-core mag pipeline takes forever on a small HPC due to having to submit 3000 jobs. It would be nice to have a parameter such as a --gtdbtk_combo_run that would submit one giant job for GTDBTK_CLASSIFYWF.