Add more threading configs #9076

flying-sheep · 2025-05-13T15:37:47Z

Big question: is this even correct? Dasks’s docs mention the three env variables mentioned here, but why “1”?

What does threads_per_worker in LocalCluster mean?

Ideally I’d start my workers, run 1 Python thread in each of them, and configure each of them so all these parallelization engines use a certain number of threads.

Closes #9075

Tests added / passed
Passes pre-commit run --all-files

no tests, since you also don’t test OMP_NUM_THREADS and the other documented env variables.

github-actions · 2025-05-13T16:33:17Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ±0 27 suites ±0 11h 15m 24s ⏱️ - 5m 30s
4 113 tests ±0 3 999 ✅ +6 111 💤 ±0 3 ❌ - 5
51 569 runs - 1 49 257 ✅ +4 2 285 💤 +1 27 ❌ - 5

For more details on these failures, see this check.

Results for commit ce38c35. ± Comparison against base commit 801d0ed.

♻️ This comment has been updated with latest results.

jacobtomlinson

What does threads_per_worker in LocalCluster mean?

Each worker is a separate process. Each worker process can have multiple threads. In use cases where the task releases the GIL it can be beneficial to use many threads because the inter-thread communication cost is lower. In cases where the tasks aren't threadsafe or don't release the GIL is can be better to have 1 thread with many processes.

By default we use a balanced profile where we look at how many CPU cores you have and then create a few processes, each with a few threads. The product of processes x threads == CPU cores. This doesn't favour any workflow type in particular.

We set these variables to 1 because we assume Dask will be running one task per thread/process, and the product of that configuration will be the total CPU cores in your system. So if you have 12 cores Dask will run 12 tasks in parallel. So we don't want the libraries they call like Numpy to try and use more than 1 core, otherwise we will have too much CPU contention.

Ultimately these things come down to tuning for your specific workflows. You tweak these numbers when you are debugging or trying to squeeze more performance out. If you are finding a benefit from setting the variables you mention here for your workflow that's great. I'm not sure I see the value in setting this default for all Dask users though unless there is a clear problem that many people are running into.

fjetter · 2025-05-14T15:04:09Z

We've also seen a bit of fallout with these settings for users who are intentionally running dask with one thread per worker, assuming that the lib underneath parallelizes just to realize that this isn't happening by default due to these settings. Because of this, I'm generally not enthusiastic about these settings. While some users may benefit from it, it is a surprising side effect when running things inside of a dask worker.

(I'm not blocking, just adding context)

flying-sheep · 2025-05-14T18:32:21Z

Each worker process can have multiple threads

So that means Python threads that don’t actually have any benefit apart from I/O?
I guess that might help in IO-bound operations, but I’d rather let my native code multithread, it’s better at it than Python.

jacobtomlinson · 2025-05-15T09:11:39Z

I’d rather let my native code multithread

Absolutely! In this case you want to set the number of threads and processes per machine to 1 and then let your native code leverage all the cores.

This isn't something we should be setting as the default for all users though. So I'm going to close this out.

flying-sheep · 2025-05-15T09:47:14Z

That’s not what I meant. setting it to 1 is vastly preferable to having it implicitly be n_cores, for exactly the same reasons why you have the other env variables set.

If I run a numba function in map_blocks without this on a 256-core machine and threads_per_worker=4 I have 64 workers each starting 256 numba threads.

I think this should be re-opened, but maybe I missed something. I don’t have my head wrapped around this fully, but I’m getting there.

jacobtomlinson · 2025-05-22T09:20:57Z

I missed your last comment, otherwise I would've reopened this sooner.

Generally a libraries native parallelism is going to be more performant that Dask's because Dask sits in the Python side. As @fjetter mentioned the existing settings have already been controversial because it breaks people's expectations of how these libraries work.

I agree that if you layer Dask parallelism on top of Numba parallelism you've going to get oversubscription. But this isn't surprising. Most users would expect this to be the case. It's then your job to tune one or the other intentionally.

Setting these defaults will lead to worse performance on average, even if it results in increased performance OOTB for a minority.

Typically we optimise for "good enough for most people" out of the box, then leave users to tune as they need. I'm hesitant to merge something that will degrade the average experience.

jacobtomlinson · 2025-05-22T09:21:58Z

I think a much better solution here would be to write up a documentation page on this topic with advice and best practices for tuning these variables. Currently our documentation is lacking in this area. If you were to open a PR with that instead of changing the defaults I would merge it in an instant 😃.

flying-sheep · 2025-05-22T09:45:30Z

I agree, I just think the behavior of configuring some threading libraries and not others is worse than configuring all or none.

So I’d say (in addition to the improved docs) we should merge this or remove the existing env variables instead

jacobtomlinson · 2025-05-22T10:19:11Z

Sure. If you want to open a PR that updates the docs and removes the other variables I would be fine with that.

flying-sheep · 2025-05-26T11:42:49Z

OK, since that’s the direction we’re going in, this can be actually closed.

Thanks!

flying-sheep · 2025-05-26T12:26:21Z

Done: #9081 and dask/dask#11966

Add more threading configs

ce38c35

flying-sheep requested a review from fjetter as a code owner May 13, 2025 15:37

flying-sheep mentioned this pull request May 13, 2025

Call numba.set_num_threads when n_jobs is specified (e.g. in scanpy.tl.rank_genes_group) scverse/scanpy#2390

Open

1 task

jacobtomlinson reviewed May 14, 2025

View reviewed changes

jacobtomlinson closed this May 15, 2025

flying-sheep mentioned this pull request May 22, 2025

Add more threading configs #9079

Closed

2 tasks

jacobtomlinson reopened this May 22, 2025

flying-sheep closed this May 26, 2025

flying-sheep mentioned this pull request May 26, 2025

remove *NUM_THREADS env variables from default config #9081

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add more threading configs #9076

Add more threading configs #9076

Uh oh!

flying-sheep commented May 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 13, 2025 •

edited

Loading

Uh oh!

jacobtomlinson left a comment

Uh oh!

fjetter commented May 14, 2025

Uh oh!

flying-sheep commented May 14, 2025

Uh oh!

jacobtomlinson commented May 15, 2025

Uh oh!

flying-sheep commented May 15, 2025

Uh oh!

jacobtomlinson commented May 22, 2025

Uh oh!

jacobtomlinson commented May 22, 2025

Uh oh!

flying-sheep commented May 22, 2025

Uh oh!

jacobtomlinson commented May 22, 2025

Uh oh!

flying-sheep commented May 26, 2025

Uh oh!

flying-sheep commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Add more threading configs #9076

Add more threading configs #9076

Uh oh!

Conversation

flying-sheep commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

jacobtomlinson left a comment

Choose a reason for hiding this comment

Uh oh!

fjetter commented May 14, 2025

Uh oh!

flying-sheep commented May 14, 2025

Uh oh!

jacobtomlinson commented May 15, 2025

Uh oh!

flying-sheep commented May 15, 2025

Uh oh!

jacobtomlinson commented May 22, 2025

Uh oh!

jacobtomlinson commented May 22, 2025

Uh oh!

flying-sheep commented May 22, 2025

Uh oh!

jacobtomlinson commented May 22, 2025

Uh oh!

flying-sheep commented May 26, 2025

Uh oh!

flying-sheep commented May 26, 2025

Uh oh!

Uh oh!

flying-sheep commented May 13, 2025 •

edited

Loading

github-actions bot commented May 13, 2025 •

edited

Loading