-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: parallel task runtime #22631
WIP: parallel task runtime #22631
Conversation
Any chance you could give a very simple example of what the interface would like like in user code? |
The Julia interface is not designed yet. While elaborations are possible, the essence of the interface is similar to Cilk so something like:
|
@kpamnany: Does this mean I can have a background thread/task running parallel to the foreground thread. In other words: Can I run a thread asynchronously? |
@tknopp: yes, that's what spawn will do. If you spawn a task and there's more than one thread, it will start running right away. It will continue to run until a yield point (another spawn, a sync, a parfor, or an explicit yield). |
src/forkjoin-ti.c
Outdated
while (jl_atomic_load_acquire(&tiarg->state) == TI_THREAD_INIT) | ||
jl_cpu_pause(); | ||
|
||
// Assuming the functions called below doesn't contain unprotected GC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"doesn't" -> "don't"?
STATIC_INLINE uint64_t cong(uint64_t max, uint64_t unbias, uint64_t *seed) | ||
{ | ||
while ((*seed = 69069 * (*seed) + 362437) > unbias) | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semicolon on separate line as a ghost of the empty loop body or unintentional extra space?
src/partr.c
Outdated
|
||
init_started_thread(); | ||
|
||
// Assuming the functions called below doesn't contain unprotected GC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"doesn't" -> "don't" here as well?
Seems worthwhile to experiment with the Projects feature for this one. |
Just out of curiosity, is this something that might make it into 1.0? |
We will try to get it into 1.0 if it is ready before feature freeze, but not hold 1.0. I am personally hopeful that it will be ready by 1.0. Hope that helps. |
If it can't make it into 1.0 in complete form, we can at least include it in experimental form and try our damndest to leave room for it in the 1.x series – we really don't want to have to wait until 2.0 for full-on threading support. |
@kpamnany, will the spawned tasks be able to perform asynchronous IO via the libuv event loop ? Is the plan to have the main Julia thread run the event loop and perform all compute only tasks in separate threads? Currently, in the |
@amitmurthy: who runs the event loop (and how) is a good question. As a general statement, irrespective of Julia, unless you reserve a thread for I/O, it is possible for requests to be serviced late/very slowly. But you don't always have/want a thread to reserve. Ideally, this should be a program choice. Executing tasks are not preempted. The API entry points (spawn, sync, and parfor) may cause the calling task to yield. However, this runtime allows for sticky tasks, i.e. tasks that only run on the thread that started them. Sticky tasks do not yield in spawn and parfor. So, you can create a sticky I/O task and drive the event loop from it. It's pretty straightforward to allow tasks to perform asynchronous I/O requests, but it isn't obvious how to get completion notifications. I'm not entirely sure how to do this right now but @vtjnash and @JeffBezanson have probably thought this out in greater detail (they suggested sticky tasks). Clearly it would be a useful enhancement to this runtime to add the ability to trigger a task based on an event. But that gets us into having to define events, and decide semantics for event mux/demux and that opens many questions -- are there system events? Can multiple tasks be triggered by the same event? How about the conjunction or disjunction of multiple events? Not sure we should go down this rabbit hole right now. |
Our existing Tasks can already be triggered by events, so we're already in the rabbit hole. We can't fully leave this up to applications; we need to make some default choice for people. |
It seems like the default should probably be to have a sticky I/O thread since most applications don't need all of the threads. For really high performance situations where one wants to defer I/O until the I/O thread wakes up, we should probably have people opt into that. |
Googled a bit on integrating libuv and multithreading. See I would like to try out the following simple model in parallel to the work being done
At the very least it will help in getting a handle on libuv event loop integration in a multi-threaded environment. |
Thanks @amitmurthy that sounds basically good. I suspect this can work with normal Tasks, though. When a Task (running on any thread) wants to do I/O, it queues its request and yields. When the I/O completes, the requesting Task can be restarted as usual. |
src/julia_internal.h
Outdated
int last_arriver(arriver_t *, int); | ||
void *reduce(arriver_t *, reducer_t *, void *(*rf)(void *, void *), void *, int); | ||
#endif // JULIA_ENABLE_PARTR | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these should maybe be in a scheduler.h; they probably shouldn't be called by miscellaneous run time system code.
@kpamnany |
Any idea to make it to a package first? |
Hi folks! #iamintel here to help Kiran to push multi-threading forward as he's transitioned to other projects. He offered me to work on libuv-related stuff while he's finishing some other parts. @amitmurthy, are you working on the approach you suggested on Jul 19th? |
@amitmurthy is off grid for a the next couple of weeks. |
Shall I merge #29791 in here? |
Looks like a big patch, and this is a big patch too with some time pressure to merge. @JeffBezanson can make the call. |
Ah, I see now, it is already on master so it has to be merged. It'd be best if we could get |
The new code is likely to help in the circumstance that you're loosing stack traces of exceptions thrown in tasks due to context switching. Other than that, it only resolves the conflicts with master. |
Ok I think I found the next problem. wait/isready/n_avail depend on the length of |
Now other tests pass but there is a mystery crash in the embedding test. 😡 |
src/task.c
Outdated
static void record_backtrace(jl_ptls_t ptls) JL_NOTSAFEPOINT | ||
{ | ||
// storing bt_size in ptls ensures roots in bt_data will be found | ||
ptls->bt_size = rec_backtrace(ptls->bt_data, JL_MAX_BT_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is nice to have factored out (especially for having a place to put the note about rooting). I only removed it for symmetry with the equivalent code which had removed it in partr.c
. We should probably just call record_backtrace
there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either is fine, I was just minimizing the diff for now.
Such good progress! The Win* failures are mystifying -- I see a lot of We're still going to turn |
I don't understand all the complexity and current state of this PR, so please forgive for sounding impatient or pushy, but I was just wondering what the chances of seeing some part of this released in the soon upcoming 1.1? Are there any easy tasks that someone (like me), who perhaps isn't very familiar with the language implementation side of Julia, work on to help push things along? |
The remaining blocker is unresolved bugs, mainly on Windows. So, if you have a Windows system (or anything else, actually), you could clone this branch, build it and run all the tests. If there are crashes, try to debug them. However, I don't suspect that will be particularly easy, but help is welcomed. Note that you probably need to also be on #30186 or one of the other Channel API revision branches. I'm a bit unclear on which one should be on at this point. |
This branch still ought to work on its own, at least with 1 thread. |
Added partr code. Abstracted interface to threading infrastructure.
7a6d4ba
to
8714c98
Compare
win32: generate_precompile hanging |
This replaces the existing fork-join threading infrastructure with a parallel task runtime (partr) that implements parallel depth first scheduling. This model fully supports nested parallelism.
The default remains the original threading code. Enable partr by setting
JULIA_PARTR := 1
in yourMake.user
.The core idea is simple -- Julia tasks can now be run by any thread. The task scheduler attempts to order task execution depth-first for provably better cache efficiency, and for true nested parallelism.
However, as tasks are an existing thing in Julia and used in a number of places, we're first introducing the infrastructure that will enable parallel tasks with this PR, keeping (hopefully) the serial semantics of the existing task interface. This PR does not introduce any new interface calls for parallel tasks -- those will be in future PRs.
All test-cases pass with
JULIA_PARTR
off (as they should). WithJULIA_PARTR
on, all test cases are currently passing on Linux and OS-X.Cc: @JeffBezanson, @vtjnash, @yuyichao, @ViralBShah, @vchuravy, @anton-malakhov.