Skip to content

Regarding --low_ram_primary_clustering, the error message shows OSError: [Errno 7] Argument list too long #288

@QiaofanLi

Description

@QiaofanLi

Dear Matt Olm,

I have a dataset of 169k MAGs. I can't use the default dereplicate parameters in dRep (it would exceed memory). Therefore, I used the following parameters:
dRep dereplicate f__Oscillospiraceae.res -g f__Oscillospiraceae.list -comp 70 -con 5 -d --genomeInfo quality_report_without_0.csv -p 96 --skip_plots -pa 0.8 --low_ram_primary_clustering --primary_chunksize 20000

When running with these parameters, I get the following error:


..:: dRep dereplicate Step 1. Filter ::..

Will filter the genome list
Loading genomes from a list
169,453 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
409.86% of genomes passed checkM filtering


..:: dRep dereplicate Step 2. Cluster ::..

Running primary clustering
Running pair-wise MASH clustering
Will split genomes into 9 groups for primary clustering
Traceback (most recent call last):
File "/usr/local/bin/dRep", line 32, in
Controller().parseArguments(args)
File "/usr/local/lib/python3.10/dist-packages/drep/controller.py", line 100, in parseArguments
self.dereplicate_operation(**vars(args))
File "/usr/local/lib/python3.10/dist-packages/drep/controller.py", line 48, in dereplicate_operation
drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
File "/usr/local/lib/python3.10/dist-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/drep/d_cluster/controller.py", line 184, in d_cluster_wrapper
GenomeClusterController(workDirectory, **kwargs).main()
File "/usr/local/lib/python3.10/dist-packages/drep/d_cluster/controller.py", line 32, in main
self.run_primary_clustering()
File "/usr/local/lib/python3.10/dist-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/drep/d_cluster/compare_utils.py", line 110, in all_vs_all_MASH
genome_chunks = run_mash_on_genome_chunks(genome_chunks, mash_exe, sketch_folder, MASH_folder, logdir, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/drep/d_cluster/compare_utils.py", line 180, in run_mash_on_genome_chunks
drep.thread_cmds(cmds, logdir=logdir, t=int(p))
File "/usr/local/lib/python3.10/dist-packages/drep/init.py", line 56, in thread_cmds
pool.map(thread_cmd_wrapper, tups)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/usr/local/lib/python3.10/dist-packages/drep/init.py", line 51, in thread_cmd_wrapper
run_cmd(*tup)
File "/usr/local/lib/python3.10/dist-packages/drep/init.py", line 47, in run_cmd
call(cmd,stdout=sto, stderr=ste)
File "/usr/lib/python3.10/subprocess.py", line 345, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/lib/python3.10/subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/home/mash-Linux64-v2.3/mash'

How should this situation be resolved? What confuses me is that if I have 700,000 MAGs, how should I input them all at once into dRep for dereplication?

Best Regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions