Open
Description
openedon Jul 5, 2023
Currently, for non-sticky tasks, Julia wakes all threads in schedule
(tid
is 0 here). This is a serial loop with a lock/signal/unlock for each thread which is quite slow when there are 32 threads. The comment in jl_wakeup_thread
suggests that this problem was anticipated, but it is unclear what would constitute a proper fix (I'm not convinced that the idea in the comment is a good solution).
The new interactive threadpool is an added complication. We wake all threads, but we should probably only be waking up the threads in the scheduled task's threadpool.
The problem is exacerbated if we oversubscribe cores.
This is at least one of the reasons we so easily run into negative scaling when we add threads.
Opened this issue for tracking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment