-
-
Notifications
You must be signed in to change notification settings - Fork 410
Parallel task evaluation #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks! Will take a look. Some initial comments:
Sounds good; just a flag you pass similar to
This is fine for now. I think long term the way forward is to remove the outer
Probably fine for now. Finer grained parallelism may or may not be better than terminal-group-level paralellism, but let's leave that for future follow-up investigation. terminal-group-level parallelism also has the benefit that it is simpler for a user to understand, since most of the time they are interacting with terminal NamedTasks at the head of a terminal group, and not with individual tasks themselves |
b30fad6
to
c49e1b1
Compare
I suspect tasks, which get an Evaluator instance, are tricky to evaluate in parallel. As they can impact the whole tree, e.g. the clean task can be destructive, the work-to-do must be re-evaluated after those tasks complete. |
I'd like to eventually get to a state where tasks that currently take an evaluator instance are not tasks at all, but something else, and cannot be part of the task graph and take part caching, the dependency-based evaluation ordering, or any of that stuff. Not sure how that applies to your PR but that's where I hope to get to. |
This initial proof of concept code uses a hardcoded threadpool with 4 threads. It is based on the Java ExecutionService and CompletionService to gain the following features: * control work and state from single thread * easy waiting for completion * easy abortion of all working jobs * no dependency to external library We only parallelize whole terminal groups (which are most probably a collection of multiple tasks). All elements of a terminal group are by a high change depended on each other and also very fine grained, so that parallelization would not provide much gain; in contrary, it could produce more stress and overhead. But this is only an unverified assumption. There is certainly a more elegant solution to calculate the task tree (which is a terminal group tree), but I just wanted to concentrate on the parallel processing for now. See com-lihaoyi#32
This avoids concurent cursor movements resulting in ticker (over)drawing multiple lines
@lefou thanks, will find time to take another look |
I'm using the mill-par exclusively for a while now. No issues at all for me. Nice speedup, especially when running tests. |
Travis-CI is now green, jay! |
So out of curiosity, what's the plan regarding this branch ? |
Not sure about Tobias’ plans, but my last ask is to port the custom thread
scheduler to use Futures before merging. Whether he does it, or someone
else does it, or I end up doing it is immaterial, but I do think it’s a
significant enough win in code cleanliness that I’d like it to happen
before merging
…On Sat, 11 Apr 2020 at 11:08 PM, Olivier Mélois ***@***.***> wrote:
So out of curiosity, what's the plan regarding this branch ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#480 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE5HBDE53JZDSXPN45UEYVDRMCBXJANCNFSM4GBXTFZA>
.
|
@Baccata, glad you asked. The branch itself is pretty stable for me. I'm using it almost on a daily basis. I'm eager to merge it, but asked @lihaoyi for a second review a month ago. He already demanded a change to Scala Futures (initially, I choose Java Futures because they are cancelable, but it isn't really required and less readable). But we agreed to defer that to after the merge in another PR, mostly because of lack of time and this PR is already pretty big. What I ask myself: Should we make the parallel evaluation the default (as it is now in the branch), or should we default to |
Oh, looks like Haoyi already answered (concurrently) and we took different conclusions regarding the Futures change. |
I’d say default to non parallel for now. For me at least I have quite a
number of workers which are not thread safe, and passing in a flag for
parallelism is no real hardship.
…On Sun, 12 Apr 2020 at 3:58 AM, Tobias Roeser ***@***.***> wrote:
Oh, looks like Haoyi already answered (concurrently) and we took different
conclusions regarding the Futures change.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#480 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEB7HE2BQLF4XQHKOIINDRMDDWVANCNFSM4GBXTFZA>
.
|
Yeah I still would like the implementation to use Scala futures. For simple
things I don’t mind a little redundancy or reimplementation, but the
future/thread scheduler we have hand rolled here is gnarly and complicated
and seems like a maintenance headache going forward.
Concurrency/parallelism-related code in particular is always something I
would try extra hard to keep simple.
If we discover a reason Scala futures don’t work that’s fine, but we should
have a good reason before committing to our hand-rolled scheduler.
…On Sun, 12 Apr 2020 at 4:05 AM, Li Haoyi ***@***.***> wrote:
I’d say default to non parallel for now. For me at least I have quite a
number of workers which are not thread safe, and passing in a flag for
parallelism is no real hardship.
On Sun, 12 Apr 2020 at 3:58 AM, Tobias Roeser ***@***.***>
wrote:
> Oh, looks like Haoyi already answered (concurrently) and we took
different
> conclusions regarding the Futures change.
>
> —
> You are receiving this because you were mentioned.
>
>
> Reply to this email directly, view it on GitHub
> <#480 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAHEB7HE2BQLF4XQHKOIINDRMDDWVANCNFSM4GBXTFZA
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#480 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE5HBDCF2Y3XLIGJHJEZGNLRMDEPZANCNFSM4GBXTFZA>
.
|
The thing is, we won't get rid of the hand rolled scheduling just by changing from Java Futures to Scala Futures. As we plug-in the multithreading on a high level, we need to handle some concurrency constraints. When creating the futures, we have no idea if a target is cached before we actually start the future. And we need to make sure that only one target with the same name runs at the same time. This is not only the case when the user requests the same target twice (e.g. I think to get rid of the hand roled scheduling would mean, we also need to refactor the And that's a bitter pill to me, especially as I've already investent a great amount of time to make the current branch ready to work properly with all projects I know. So my request is to reconsider adding the current branch as an experimental feature right now. It's disabled by default and easy to remove or improve too. But adding it now allows me and many others to profit from the huge speedups when running compiles and tests right now. Also the additional merge overhead with each change in master was not fun in the past and will not be in the future. It's the reason why I was no longer able to rebase this branch, but needed to merge with master now and then. As an alternative, we could prepare the Evaluator to have an injectable sheduling strategy and some mechanism to let the user to configure it. |
What you say makes sense. Let's merge this as is then! If I want to try a different evaluation strategy, or if there remain issues, we can do that in a later PR. Looks good to me! |
Thank you! I will polish a bit more and merge. I think tagging mill afterwards as 0.7.0 is a good idea. |
@lefou give me a few days before tagging 0.7.0, I also want to try and squeeze in a Scala 2.13 upgrade now that sbt/zinc#754 has landed :) |
@lihaoyi Sure. |
This branch experiments with parallel task execution in mill.
This PR is an implementation of #32. Please check out and give feedback! If you need a more recent mill version, just ping me (@lefou) and I'll re-sync the branch.
I try to update this description on the go, to reflect the current state. For details have a look at the commit messages. Everything is open for discussion, either here or on Gitter. Just ping me.
How to test
Checkout and build locally.
The last line copies the mill executable to
~/mill-par
, but you can freely choose a different location.By default, mill uses as much build threads as available logical cpu cores. You can change it with
--jobs
or-j
option. Use-j 1
to disable parallel execution at all.Please note, that the
all
target enforces serial processing to ensure that calls likemill -i all clean __.compile
will still work. In most cases (e.g. when you don't include aclean
) you might want to use the newpar
target, which runs the following targets in parallel. For example:mill -i par __.compile __.testCached
.If you found issues, please report!
If it works for you, please report, too!
How to compare compile speeds
After a change of the mill version, various files need to be re-compiled (ammonite env,
build.sc
and possibly more). So, before you can compare build times with different mill versions you should runmill -i version
, which ensures these things are done and wont foul your timings.Also, if you want compare times side-by-side, you better clean your
out
directory withrm -f out
.For starters, if you just want to compare compile times, compare the outputs of:
and
Don't expect wonders! The more cores you have, the better the speedup. Many Laptops throttle their cpu frequencies when they are gooing too hot. Higher cpu usage means more heat, so after all the build times may even increase because you are running in parallel. Look at the ratio between real and user time.
Reporting bugs
If you think you found a bug, here are some infos that help me understanding and fixing it:
mill-par --version
out/evaluator.log
is interesting (or run mill with-d
)mill-par -j 1
?Pull Request Details
Task Evaluation
In this attempt, I use a Java
ExecutionService
for the job management, to avoid additional dependencies. It also allows some easy reasoning about job scheduling, as the control work and state handling is managed from the single self thread. Only Task execution happens in separate thread pool.It uses a fixed thread pool with a configurable thread count. The thread pool is managed inside of Evaluator and created and shutdown for each evaluation phase (invokation of the
evaluate
def). That also means, that when running tasks which itself use an Evalutator instance, we end up with another thread pool. As an examplemill all clean __.compile
will run theall
task in an outerEvaluator
instance, which will block until an innerEvaluator
instance completes the other tasks. Thus, I plan to redesign it to manage parallelism outside of Evaluator and increase the pool size for each new Evaluator instance, to gracefully adapt to the more or less blocking threads, that wait for the finish of the inner Evaluator instance.Currently, we only parallelize whole terminal groups (which are most probably a collection of multiple tasks). All elements of a terminal group are by a high change depended on each other and also very fine grained, so that parallelization would not provide much gain; in contrary, it could produce more stress and overhead. But this is only an unverified assumption.
Logging and output
Except for ticker message, I try to prefix all output that goes through the Logger with the current task name.
Caveats
Tasks that get an
Evaluator
instance are potentially sensitive to parallel execution. For example the built-inclean
task will most probably not work as expected when given as an argument to the built-inall
task. This will be fixed once #502 is addressed.