Description
Hello dear developers,
I recently ran BiG-SCAPE v2 on a data set containing 100,000 + gbk files (BGC of antiSMASH annotations) for over 6 days. At present, the step of "Found 5 connected components" has been stuck for more than 48 hours, the cpu usage is 2%, but the memory usage is 850+G.
May I ask why this is, and do I need to stop the whole program? I would appreciate it if you could answer me.
I ran it with the following code:
bigscape cluster -i input_folder/ -o output_folder/ -p /home/boot/download/BiG-SCAPE-2.0.0-beta.5/Pfam-A.hmm -c 65 --gcf-cutoffs 0.3,0.7 --mix --alignment-mode local --extend-strategy greedy
The log shows the following:
Calculating distances: 100%|██████████| 5076943761/5076943761 [38:07:49<00:00, 36985.18edge/s] 2025-03-25 12:54:24,897 INFO Generated 5076943761 edges 2025-03-25 12:54:25,539 INFO Generating antismash class bins 2025-03-25 12:54:41,838 INFO Bin 'terpene': 258269628 pairs from 22728 BGC records 2025-03-25 12:59:20,377 INFO Bin 'RiPP': 344833191 pairs from 26262 BGC records 2025-03-25 13:05:29,314 INFO Bin 'PKS': 162621595 pairs from 18035 BGC records 2025-03-25 13:08:32,332 INFO Bin 'NRPS': 85392846 pairs from 13069 BGC records 2025-03-25 13:10:14,780 INFO Bin 'other': 136513026 pairs from 16524 BGC records 2025-03-25 13:12:51,531 INFO Bin 'NRPS.other': 11175 pairs from 150 BGC records 2025-03-25 13:12:51,652 INFO Bin 'NRPS.PKS': 3638253 pairs from 2698 BGC records 2025-03-25 13:12:59,078 INFO Bin 'PKS.other': 161596 pairs from 569 BGC records 2025-03-25 13:12:59,891 INFO Bin 'NRPS.RiPP': 10585 pairs from 146 BGC records 2025-03-25 13:12:59,987 INFO Bin 'other.terpene': 3240 pairs from 81 BGC records 2025-03-25 13:13:00,023 INFO Bin 'RiPP.other': 7140 pairs from 120 BGC records 2025-03-25 13:13:00,097 INFO Bin 'PKS.RiPP': 2775 pairs from 75 BGC records 2025-03-25 13:13:00,125 INFO Bin 'PKS.terpene': 3828 pairs from 88 BGC records 2025-03-25 13:13:00,168 INFO Bin 'NRPS.PKS.other': 171 pairs from 19 BGC records 2025-03-25 13:13:00,172 INFO Bin 'saccharide': 276 pairs from 24 BGC records 2025-03-25 13:13:00,177 INFO Bin 'RiPP.terpene': 2850 pairs from 76 BGC records 2025-03-25 13:13:00,208 INFO Bin 'NRPS.terpene': 780 pairs from 40 BGC records 2025-03-25 13:13:00,221 INFO Bin 'NRPS.RiPP.other': 55 pairs from 11 BGC records 2025-03-25 13:13:00,223 INFO Bin 'NRPS.PKS.terpene': 120 pairs from 16 BGC records 2025-03-25 13:13:00,226 INFO Bin 'NRPS.PKS.RiPP': 325 pairs from 26 BGC records 2025-03-25 13:13:00,232 INFO Bin 'NRPS.PKS.RiPP.terpene': 1 pairs from 2 BGC records 2025-03-25 13:13:00,233 INFO Bin 'PKS.saccharide': 0 pairs from 1 BGC records 2025-03-25 13:13:00,234 INFO Bin 'PKS.other.saccharide': 0 pairs from 1 BGC records 2025-03-25 13:13:00,234 INFO Bin 'NRPS.other.terpene': 0 pairs from 1 BGC records 2025-03-25 13:13:00,235 INFO Bin 'PKS.RiPP.terpene': 0 pairs from 1 BGC records 2025-03-25 13:13:00,236 INFO Bin 'PKS.other.terpene': 1 pairs from 2 BGC records 2025-03-25 13:13:00,237 INFO Bin 'other.saccharide': 0 pairs from 1 BGC records 2025-03-25 13:13:00,237 INFO Bin 'PKS.RiPP.other': 0 pairs from 1 BGC records 2025-03-25 13:13:00,242 INFO Saving database to /media/HD1/crz/bigscape_out/bigscape_out.db 100%|██████████| 106994049/106994049 [16:17<00:00, 109461.08it/s] 2025-03-25 13:29:17,712 INFO Generating families 2025-03-25 13:29:17,811 INFO Generating connected components for Bin 'mix': cutoff 0.7 Generating connected components: 100%|██████████| 100767/100767 [18:41:35<00:00, 1.50nodes/s] 2025-03-26 08:10:54,065 INFO Found 5 connected components