-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Hi there,
I've been using Vamb for a while (esp. TaxVamb since 4.1.4.dev150+g8fa3280) and just decided to update recently. I installed Vamb from master (5.0.5.dev20+g8a13cf5f8) and after dealing with the more strict checks for the taxonomy files (now every contig needs to be listed) I just run into something a bit odd, i.e. the vaevae_clusters_split.tsv file is basically empty (only column headers) but there is no error or warning in the log regarding the binsplitting not happening.
(base) z3382651@katana1:.../Functional/binning $ ll -rht taxvamb5_out/
total 881M
-rw-rw----+ 1 z3382651 ferrari 288M Nov 5 00:52 composition.npz
-rw-rw----+ 1 z3382651 ferrari 30M Nov 5 00:53 abundance.npz
-rw-rw----+ 1 z3382651 ferrari 32M Nov 5 02:46 predictor_model.pt
-rw-rw----+ 1 z3382651 ferrari 169M Nov 5 03:25 results_taxometer.tsv
-rw-rw----+ 1 z3382651 ferrari 202M Nov 5 10:26 vaevae_model.pt
-rw-rw----+ 1 z3382651 ferrari 87M Nov 5 10:28 vaevae_latent.npz
-rw-rw----+ 1 z3382651 ferrari 23 Nov 5 10:35 vaevae_clusters_split.tsv
-rw-rw----+ 1 z3382651 ferrari 53M Nov 5 10:35 vaevae_clusters_unsplit.tsv
-rw-rw----+ 1 z3382651 ferrari 22M Nov 5 10:35 vaevae_clusters_metadata.tsv
drwxrws---+ 2 z3382651 ferrari 44K Nov 5 10:36 bins
-rw-rw----+ 1 z3382651 ferrari 159K Nov 5 10:36 log.txt
I'm reusing already "labelled" contigs in the form of {SAMPLE}-{CONTIG}, i.e., using - as separator (and indicated as such in the vamb bin taxvamb command), and seemingly being detected (it shows at the end of the log):
2025-11-05 10:28:06.293 | INFO | Clustering
2025-11-05 10:28:06.294 | INFO | Windowsize: 300
2025-11-05 10:28:06.294 | INFO | Min successful thresholds detected: 15
2025-11-05 10:28:06.294 | INFO | Max clusters: None
2025-11-05 10:28:06.294 | INFO | Use CUDA for clustering: True
2025-11-05 10:28:06.294 | INFO | Binsplitter: "-"
2025-11-05 10:28:06.456 | INFO | 10 % of contigs clustered
2025-11-05 10:28:33.912 | INFO | 20 % of contigs clustered
2025-11-05 10:28:40.737 | INFO | 30 % of contigs clustered
2025-11-05 10:28:51.559 | INFO | 40 % of contigs clustered
2025-11-05 10:29:09.755 | INFO | 50 % of contigs clustered
2025-11-05 10:29:43.168 | INFO | 60 % of contigs clustered
2025-11-05 10:30:36.292 | INFO | 70 % of contigs clustered
2025-11-05 10:31:44.854 | INFO | 80 % of contigs clustered
2025-11-05 10:32:54.945 | INFO | 90 % of contigs clustered
2025-11-05 10:34:03.634 | INFO | 100 % of contigs clustered
2025-11-05 10:35:11.088 | INFO | Clustered 1120675 contigs in 497327 split bins (480100 clusters)
2025-11-05 10:35:11.090 | INFO | Wrote cluster file(s) in 424.8 seconds.
2025-11-05 10:36:08.708 | INFO | Wrote clusters above 200000 bp to FASTA files in 57.62 seconds.
I only found out because if I run vamb recluster off the Vamb 5 output, it doesn't generate any bins, whether you indicate the binsplitter or not.
As a side note, the same data was successfully run with the old Vamb 4.1.4.dev150+g8fa3280, with both the binsplitting and recluster working without issues despite the alternate binsplitter character:
(base) z3382651@katana1:.../Functional/binning $ ll -rht taxvamb_out/
total 865M
-rw-rw----+ 1 z3382651 ferrari 286M Nov 1 09:47 composition.npz
-rw-rw----+ 1 z3382651 ferrari 30M Nov 1 09:47 abundance.npz
-rw-rw----+ 1 z3382651 ferrari 36M Nov 2 10:14 predictor_model.pt
-rw-rw----+ 1 z3382651 ferrari 127M Nov 2 11:09 results_taxometer.tsv
-rw-rw----+ 1 z3382651 ferrari 216M Nov 5 10:43 vaevae_model.pt
-rw-rw----+ 1 z3382651 ferrari 87M Nov 5 10:47 vaevae_latent.npz
-rw-rw----+ 1 z3382651 ferrari 30M Nov 5 12:46 vaevae_clusters_metadata.tsv
-rw-rw----+ 1 z3382651 ferrari 25M Nov 5 12:46 vaevae_clusters_unsplit.tsv
-rw-rw----+ 1 z3382651 ferrari 30M Nov 5 12:46 vaevae_clusters_split.tsv
drwxrws---+ 2 z3382651 ferrari 40K Nov 5 12:47 bins
-rw-rw----+ 1 z3382651 ferrari 151K Nov 5 12:47 log.txt
The commands used for both runs were the same, with the exception of --cuda removed from the Vamb 4 run as the university HPC has a 12h limit on the GPU queue and was going overtime:
vamb bin taxvamb -p 8 --cuda -o - \
--outdir taxvamb5_out --fasta merged_contigs.fasta \
--abundance_tsv abundance.tsv \
--taxonomy mmseqs-easy/taxonomy_lca.taxconv.tsv \
--minfasta 200000
vamb recluster -p 8 --cuda -o - \
--outdir recluster5_out \
--fasta merged_contigs.fasta \
--abundance taxvamb5_out/abundance.npz \
--latent_path taxvamb5_out/vaevae_latent.npz \
--taxonomy taxvamb5_out/results_taxometer.tsv \
--clusters_path taxvamb5_out/vaevae_clusters_split.tsv \
--hmm_path /srv/scratch/ferrari/utils/vamb/vamb/marker.hmm \
--minfasta 200000