Description
openedon Jun 23, 2021
Commented on this being a potential issue in #40715, and have I believe verified it in 1.7.0-beta2.
Reproducible:
function test()
println("start")
root_task = Threads.@spawn begin
tasks = []
active = Ref(Threads.threadid())
for i in 1:10000
active[] = -1
yield()
push!(
tasks,
@async begin
for j in 1:1000
yield()
active_thread = $active[]
if active_thread != -1
@show active_thread, Threads.threadid()
break
end
end
nothing
end
)
active[] = Threads.threadid()
end
active[] = -1
wait.(tasks)
end
wait(root_task)
println("finished")
end
Since this is a race issue output isn't deterministic, however running this on 2 threads, I get
1.6.0:
start
finished
1.7.0-beta2:
start
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (2, 1)
(active_thread, Threads.threadid()) = (1, 2)
(active_thread, Threads.threadid()) = (1, 2)
finished
Maybe I've messed something up in my test here, it's not trivial to check these scheduler behaviors, but I think this confirms what I was concerned about in this comment: #40715 (comment)
My core concern is that there are two kinds of scheduler stickiness you might want, either to stick on the system thread, or to stick on the thread as your parent task is (at any moment) scheduled on. At present these are the same, however with task migration they will no longer be. The way I see it, you want to pin your task to system threads when you need access to thread local state or you're interacting with libraries that need to run on particular threads. For the more general Julia async use cases however, what you want is to guarantee non-simultaneous execution of tasks in some group (e.g., parent and sibling async tasks).
It would be nice to consider this requirement prior to making the scheduler more flexible, and potentially also cleaning up the task scheduling API. I may be wrong, but I believe the current Threads.@Spawn and @async APIs are separate mostly for legacy reasons, and a cleaner more future proof approach may be to have a single scheduling API with a scheduling policy argument. Policies could include affinity (e.g., free, current-thread, parent-thread, initial-thread), and later would easily accommodate priority levels, NUMA based affinities, or other features that may be interesting as the scope and functionality grows.
Different @async
tasks also execute simultaneously with one another (as they may have been launched on different threads), however I took out the detection of that to simplify the code.