feature request: @distributed_threads and pmap_threads

In a setup where:
- There are multiple worker processes (possibly on different servers in a cluster).
- There are multiple threads in each process (possibly different amounts in different servers).

It isn't trivial to create such a setup - one needs to tweak launching worker processes to be multi-threaded. It would be easy if there was a command-line flag for `julia` that specified the number of threads, requested in JuliaLang/julia#34309. But it is still possible to create such a setup today with a bit of effort, and it is useful as all the threads in each worker process benefit from automatic shared memory "everything", rather than being restricted to constructs such as `SharedArray`. Of course this means one needs to be careful.

In such a scenario, the current behavior is very clear:
- A `@threads` loop uses the threads of the current (main or worker) process.
- A `@distributed` loop and `pmap` use a single thread in each worker process.

This has the advantage of simplicity and clarity. It also allows using a nested `@threads` in each iteration of `@distributed` or `pmap` to utilize all the threads in all the machines.

However, it would also be useful to have `@distributed_threads` and `pmap_threads`.

A `@distributed_threads` would statically allocate the same number of iterations for each thread across all the machines - that is, will allocate more iterations to worker processes with more threads, and then internally use `@threads` to execute these on each of the worker process threads. This would be the natural extension of `@distributed`, which uses static allocation of iterations to processes.

A `pmap_threads` would dynamically allocate tasks to each thread across all machines. The batch size, if specified, will individually apply to each thread. It might be useful to add a second batch group size (a positive number of batches) such that each worker process would get a whole group of batches at once, and use the threads to execute the smaller batches, to reduce the amount of cross-process coordination required. This would be the natural extension of `pmap` which uses dynamic allocation of iterations to processes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature request: @distributed_threads and pmap_threads #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature request: @distributed_threads and pmap_threads #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions