Closed
Description
We have a use case where we would like to dask.distributed to parallelize a python-bound C++ program that would work best if it could consume 8-32 threads depending on the problem size and will manage threading internally. Normally, a Dask worker is run on a node that has 32 cores with the worker using 2 processes at 1 thread each so that we can give each process 16 cores.
Looking at the code currently, it seems that nthreads = ncores / nprocesses
without exceptions, is there a canonical way to change this so that we can orchestrate our normal Dask worker operation with dask-jobqueue?