Parallel task evaluation #480

lefou · 2018-11-05T07:25:51Z

This branch experiments with parallel task execution in mill.

This PR is an implementation of #32. Please check out and give feedback! If you need a more recent mill version, just ping me (@lefou) and I'll re-sync the branch.

I try to update this description on the go, to reflect the current state. For details have a look at the commit messages. Everything is open for discussion, either here or on Gitter. Just ping me.

How to test

Checkout and build locally.

$ git clone -b par https://github.com/lefou/mill.git
Cloning into 'mill'...
$ cd mill
mill $ ./mill -i all __.publishLocal assembly
mill $ cp out/assembly/dest/mill ~/mill-par

The last line copies the mill executable to ~/mill-par, but you can freely choose a different location.

By default, mill uses as much build threads as available logical cpu cores. You can change it with --jobs or -j option. Use -j 1 to disable parallel execution at all.

Please note, that the all target enforces serial processing to ensure that calls like mill -i all clean __.compile will still work. In most cases (e.g. when you don't include a clean) you might want to use the new par target, which runs the following targets in parallel. For example: mill -i par __.compile __.testCached.

If you found issues, please report!
If it works for you, please report, too!

How to compare compile speeds

After a change of the mill version, various files need to be re-compiled (ammonite env, build.sc and possibly more). So, before you can compare build times with different mill versions you should run mill -i version, which ensures these things are done and wont foul your timings.

Also, if you want compare times side-by-side, you better clean your out directory with rm -f out.

For starters, if you just want to compare compile times, compare the outputs of:

rm -r out
time ~/mill-par -i __.compile

and

rm -r out
time ~/mill-jar -i -j 1 __.compile

Don't expect wonders! The more cores you have, the better the speedup. Many Laptops throttle their cpu frequencies when they are gooing too hot. Higher cpu usage means more heat, so after all the build times may even increase because you are running in parallel. Look at the ratio between real and user time.

Reporting bugs

If you think you found a bug, here are some infos that help me understanding and fixing it:

Output of mill-par --version
If possible a link to the public repository and steps to reproduce the error
The build output, esp. the error message, also out/evaluator.log is interesting (or run mill with -d)
Does the build succeed with mill-par -j 1 ?
Does the build succeed with mainline mill?

Pull Request Details

Task Evaluation

In this attempt, I use a Java ExecutionService for the job management, to avoid additional dependencies. It also allows some easy reasoning about job scheduling, as the control work and state handling is managed from the single self thread. Only Task execution happens in separate thread pool.

It uses a fixed thread pool with a configurable thread count. The thread pool is managed inside of Evaluator and created and shutdown for each evaluation phase (invokation of the evaluate def). That also means, that when running tasks which itself use an Evalutator instance, we end up with another thread pool. As an example mill all clean __.compile will run the all task in an outer Evaluator instance, which will block until an inner Evaluator instance completes the other tasks. Thus, I plan to redesign it to manage parallelism outside of Evaluator and increase the pool size for each new Evaluator instance, to gracefully adapt to the more or less blocking threads, that wait for the finish of the inner Evaluator instance.

Currently, we only parallelize whole terminal groups (which are most probably a collection of multiple tasks). All elements of a terminal group are by a high change depended on each other and also very fine grained, so that parallelization would not provide much gain; in contrary, it could produce more stress and overhead. But this is only an unverified assumption.

Logging and output

Except for ticker message, I try to prefix all output that goes through the Logger with the current task name.

Caveats

Tasks that get an Evaluatorinstance are potentially sensitive to parallel execution. For example the built-in clean task will most probably not work as expected when given as an argument to the built-in all task. This will be fixed once #502 is addressed.

lihaoyi · 2018-11-05T08:13:42Z

Thanks! Will take a look.

Some initial comments:

Currently, I use a hardcoded threadpool with 4 threads, but it is planned to make it configurable.

Sounds good; just a flag you pass similar to -w/--watch seems fine

That also means, that when running tasks which itself use an Evalutator instance, we end up with another thread pool. As an example mill all clean __.compile will run the all task in an outer Evaluator instance, which will block until a inner Evaluator instance completes the other tasks.

This is fine for now. I think long term the way forward is to remove the outer Evaluator from the top-level entry points like all that currently take an Evaluator as a parameter, so for now having an extra thread pool lying around is probably fine.

Currently, we only parallelize whole terminal groups (which are most probably a collection of multiple tasks). All elements of a terminal group are by a high change depended on each other and also very fine grained, so that parallelization would not provide much gain; in contrary, it could produce more stress and overhead. But this is only an unverified assumption.

Probably fine for now. Finer grained parallelism may or may not be better than terminal-group-level paralellism, but let's leave that for future follow-up investigation.

terminal-group-level parallelism also has the benefit that it is simpler for a user to understand, since most of the time they are interacting with terminal NamedTasks at the head of a terminal group, and not with individual tasks themselves

lefou · 2018-11-08T09:09:21Z

I suspect tasks, which get an Evaluator instance, are tricky to evaluate in parallel. As they can impact the whole tree, e.g. the clean task can be destructive, the work-to-do must be re-evaluated after those tasks complete.

lihaoyi · 2018-11-08T09:24:03Z

I'd like to eventually get to a state where tasks that currently take an evaluator instance are not tasks at all, but something else, and cannot be part of the task graph and take part caching, the dependency-based evaluation ordering, or any of that stuff.

Not sure how that applies to your PR but that's where I hope to get to.

This initial proof of concept code uses a hardcoded threadpool with 4 threads. It is based on the Java ExecutionService and CompletionService to gain the following features: * control work and state from single thread * easy waiting for completion * easy abortion of all working jobs * no dependency to external library We only parallelize whole terminal groups (which are most probably a collection of multiple tasks). All elements of a terminal group are by a high change depended on each other and also very fine grained, so that parallelization would not provide much gain; in contrary, it could produce more stress and overhead. But this is only an unverified assumption. There is certainly a more elegant solution to calculate the task tree (which is a terminal group tree), but I just wanted to concentrate on the parallel processing for now. See com-lihaoyi#32

This avoids concurent cursor movements resulting in ticker (over)drawing multiple lines

lihaoyi-databricks · 2020-03-13T07:22:39Z

@lefou thanks, will find time to take another look

lefou · 2020-03-18T07:44:53Z

I'm using the mill-par exclusively for a while now. No issues at all for me. Nice speedup, especially when running tests.

lefou · 2020-03-27T11:55:01Z

Travis-CI is now green, jay!

Baccata · 2020-04-11T15:08:24Z

So out of curiosity, what's the plan regarding this branch ?

Ammonite-Bot · 2020-04-11T19:13:57Z

Not sure about Tobias’ plans, but my last ask is to port the custom thread scheduler to use Futures before merging. Whether he does it, or someone else does it, or I end up doing it is immaterial, but I do think it’s a significant enough win in code cleanliness that I’d like it to happen before merging

…

On Sat, 11 Apr 2020 at 11:08 PM, Olivier Mélois ***@***.***> wrote: So out of curiosity, what's the plan regarding this branch ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#480 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE5HBDE53JZDSXPN45UEYVDRMCBXJANCNFSM4GBXTFZA> .

lefou · 2020-04-11T19:55:00Z

@Baccata, glad you asked. The branch itself is pretty stable for me. I'm using it almost on a daily basis. I'm eager to merge it, but asked @lihaoyi for a second review a month ago.

He already demanded a change to Scala Futures (initially, I choose Java Futures because they are cancelable, but it isn't really required and less readable). But we agreed to defer that to after the merge in another PR, mostly because of lack of time and this PR is already pretty big.

What I ask myself: Should we make the parallel evaluation the default (as it is now in the branch), or should we default to --jobs/-j 1? If we do that (which means no change in mill behavior by default), than it is pretty safe to merge as-is, I think. We can mark the feature "experimental" a release round or two, and ask for feedback. There was not much feedback IMHO, and I can't tell if it's because nobody tried or because all who tried are happy.

lefou · 2020-04-11T19:58:22Z

Oh, looks like Haoyi already answered (concurrently) and we took different conclusions regarding the Futures change.

lihaoyi · 2020-04-11T20:05:03Z

I’d say default to non parallel for now. For me at least I have quite a number of workers which are not thread safe, and passing in a flag for parallelism is no real hardship.

…

On Sun, 12 Apr 2020 at 3:58 AM, Tobias Roeser ***@***.***> wrote: Oh, looks like Haoyi already answered (concurrently) and we took different conclusions regarding the Futures change. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#480 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHEB7HE2BQLF4XQHKOIINDRMDDWVANCNFSM4GBXTFZA> .

Ammonite-Bot · 2020-04-11T20:10:47Z

Yeah I still would like the implementation to use Scala futures. For simple things I don’t mind a little redundancy or reimplementation, but the future/thread scheduler we have hand rolled here is gnarly and complicated and seems like a maintenance headache going forward. Concurrency/parallelism-related code in particular is always something I would try extra hard to keep simple. If we discover a reason Scala futures don’t work that’s fine, but we should have a good reason before committing to our hand-rolled scheduler.

…

On Sun, 12 Apr 2020 at 4:05 AM, Li Haoyi ***@***.***> wrote: I’d say default to non parallel for now. For me at least I have quite a number of workers which are not thread safe, and passing in a flag for parallelism is no real hardship. On Sun, 12 Apr 2020 at 3:58 AM, Tobias Roeser ***@***.***> wrote: > Oh, looks like Haoyi already answered (concurrently) and we took different > conclusions regarding the Futures change. > > — > You are receiving this because you were mentioned. > > > Reply to this email directly, view it on GitHub > <#480 (comment)>, or > unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAHEB7HE2BQLF4XQHKOIINDRMDDWVANCNFSM4GBXTFZA > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#480 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE5HBDCF2Y3XLIGJHJEZGNLRMDEPZANCNFSM4GBXTFZA> .

lefou · 2020-04-14T07:17:39Z

The thing is, we won't get rid of the hand rolled scheduling just by changing from Java Futures to Scala Futures. As we plug-in the multithreading on a high level, we need to handle some concurrency constraints.

When creating the futures, we have no idea if a target is cached before we actually start the future. And we need to make sure that only one target with the same name runs at the same time. This is not only the case when the user requests the same target twice (e.g. __.test, which resolves to modulex.test.test and modulex.test with default command modulex.test.test) but also each compile task depends on different versions of ZincModule.worker, and we can't run these concurrently before we know it is cached properly. That's something that wasn't obvious to me before, and which I only learned by battle testing and adapting the scheduling.

I think to get rid of the hand roled scheduling would mean, we also need to refactor the evaluateGroup* methods, split it up and be more specific with inputs and output. Currently, these do not hand-over concrete results to each other, but simply fill and use some mutable state. Also we might want to invesitage why there are multiple instances of the (from my naive understanding) same workers. Of course we can improve those things, and I'm sure, mill would greatly improve from these changes, and we should do that sooner or later, but it's still additional non-trivial work.

And that's a bitter pill to me, especially as I've already investent a great amount of time to make the current branch ready to work properly with all projects I know.

So my request is to reconsider adding the current branch as an experimental feature right now. It's disabled by default and easy to remove or improve too. But adding it now allows me and many others to profit from the huge speedups when running compiles and tests right now. Also the additional merge overhead with each change in master was not fun in the past and will not be in the future. It's the reason why I was no longer able to rebase this branch, but needed to merge with master now and then.

As an alternative, we could prepare the Evaluator to have an injectable sheduling strategy and some mechanism to let the user to configure it.

lihaoyi · 2020-04-14T07:53:16Z

What you say makes sense. Let's merge this as is then! If I want to try a different evaluation strategy, or if there remain issues, we can do that in a later PR.

Looks good to me!

lefou · 2020-04-14T08:02:03Z

Thank you! I will polish a bit more and merge. I think tagging mill afterwards as 0.7.0 is a good idea.

lihaoyi · 2020-04-14T08:05:32Z

@lefou give me a few days before tagging 0.7.0, I also want to try and squeeze in a Scala 2.13 upgrade now that sbt/zinc#754 has landed :)

lefou · 2020-04-14T08:09:13Z

@lihaoyi Sure.

lefou force-pushed the par branch from 0807696 to 02dbecd Compare November 5, 2018 21:07

lihaoyi force-pushed the master branch 4 times, most recently from b30fad6 to c49e1b1 Compare November 6, 2018 09:40

lefou force-pushed the par branch from 70beccc to 9b238fb Compare November 9, 2018 20:28

lefou added 17 commits December 4, 2018 23:18

Changed counterMsg creation

7ce1dff

Added logger context as ThreadLocal to denote console log output

9f6f7b8

Make ticker output "atomic" with synchronized block

2fb95c2

This avoids concurent cursor movements resulting in ticker (over)drawing multiple lines

Ensure log output file exists

720ccee

Disabled potiential faulty code

6465137

Reworked scheduling and set log context to current task

92d9f75

USe dedicated logging for evaluator

904bb3e

Log (info) the experimental feature and the thread count

f2c0893

Also set the append flag in FileOutputStream

c5f57cb

Added timing to par evaluator, even more logging

01afe55

Added more expressive exception message

2feee8c

Moved type aliases on top

418ef45

Use immutable map for results

3a1e622

Cleanup

c3f1b55

Added cmdline option -j,--jobs to control parallel threads

114b0d2

Support threadCount in ScriptTestSuite

7bb9320

lefou force-pushed the par branch from 23fe05d to 7bb9320 Compare December 6, 2018 09:17

lefou mentioned this pull request Dec 11, 2018

Redesign tasks that get an evaluator instance #502

Open

lefou changed the title ~~WIP: parallel task evaluation (POC)~~ Parallel task evaluation (POC) - Test / Feedback appreciated Dec 12, 2018

lefou mentioned this pull request Feb 18, 2019

Parallel Task Evaluation #32

Closed

lefou added 2 commits March 13, 2020 08:39

Use DynamicVariable to hold logger context

f73c930

Added option to disable PrintLogger context

a7b8f3c

lefou mentioned this pull request Mar 20, 2020

Interactive commands #794

Closed

lefou added 2 commits March 26, 2020 14:54

Ensure main thread is always tid 0

6b8517d

Merge branch 'master' into par

d4d385d

Merge branch 'master' into par

8119a0d

lefou added 2 commits April 12, 2020 10:12

Changed default threadcount to 1

4b8ec03

Only prefix output with target name when threadCount != 1

cced18d

lefou changed the title ~~Parallel task evaluation - Please test~~ Parallel task evaluation Apr 14, 2020

lefou added 2 commits April 14, 2020 10:14

Merge branch 'master' into par

27669f7

Added section about parallel task execution to docs

2819c0c

lefou merged commit 8d6b236 into com-lihaoyi:master Apr 14, 2020

lefou added this to the after 0.6.1 milestone Apr 14, 2020

lefou linked an issue Apr 14, 2020 that may be closed by this pull request

Parallel Task Evaluation #32

Closed

lefou deleted the par branch April 16, 2020 08:25

lefou removed the feedback wanted Additional feedback or testing is apreciated label Apr 16, 2020

Uh oh!

Parallel task evaluation #480

Parallel task evaluation #480

Uh oh!

Conversation

lefou commented Nov 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to test

How to compare compile speeds

Reporting bugs

Pull Request Details

Task Evaluation

Logging and output

Caveats

Uh oh!

lihaoyi commented Nov 5, 2018

Uh oh!

lefou commented Nov 8, 2018

Uh oh!

lihaoyi commented Nov 8, 2018

Uh oh!

lihaoyi-databricks commented Mar 13, 2020

Uh oh!

lefou commented Mar 18, 2020

Uh oh!

lefou commented Mar 27, 2020

Uh oh!

Baccata commented Apr 11, 2020

Uh oh!

Ammonite-Bot commented Apr 11, 2020 via email

Uh oh!

lefou commented Apr 11, 2020

Uh oh!

lefou commented Apr 11, 2020

Uh oh!

lihaoyi commented Apr 11, 2020 via email

Uh oh!

Ammonite-Bot commented Apr 11, 2020 via email

Uh oh!

lefou commented Apr 14, 2020

Uh oh!

lihaoyi commented Apr 14, 2020

Uh oh!

lefou commented Apr 14, 2020

Uh oh!

lihaoyi commented Apr 14, 2020

Uh oh!

lefou commented Apr 14, 2020

Uh oh!

Uh oh!

lefou commented Nov 5, 2018 •

edited

Loading