Persistent test execution worker (TestRunner supporting worker strategy) #7595

cgruber · 2019-03-01T02:50:16Z

Description of the problem / feature request:

In our bazel conversion experiment we have moved a small subset of apps, libraries, and tests over to bazel. In running head-to-head in CI, we get nearly 2x wall time on bazel. Some of that we can optimize away, however we end up with 73 of 75 tests running, on BUCK, in under 100ms, whereas bazel runs them in 1-3 seconds each. This leads to 3 minutes of Buck, 3 minutes of Gradle, and 6 minutes of Bazel executing the same build set.

While we have not fully tweaked all the optimization settings (some bits of the build we need make use workers, and sandboxing is expensive) our investigation accounted for that - the biggest cost is in test execution taking around 10x per test on Bazel vs. Gradle or Buck in our situation.

A few relevant pieces of the puzzle include:

When we create an "AllTest" suite (using @RunWith(Suite.class) and @Suite.SuiteClasses({...}),
the suite runs all hte test code (confirmed by forcing failures first throughout the tests), and it
runs in vastly less time than the sum of all the executions. In an example run, AllTest executed
16 tests' worth of code in 2.7s, while all 16 individually took between .8 and 1.9 seconds.
This "AllTest" suite only incurs the penalty of test runner setup once, for N tests, and that made
a huge difference.
These tests were parallelized, and the 2.7 seconds above was close to the in the small build,
but in a clean build these times were closer to 0.9-3.2secs, and when all 75 tests are run, the
tests got slower individually, by a minimum of .4 seconds, and the whole build lengthened
considerably disproportionately. Somewhat this is limited by the number of worker processes,
but neither Buck nor Gradle hit this, and their test times were still sub-second for nearly all
of them.
In the trace profiles, test execution is not further broken down, so there's little ability to
determine how much time is being taken by the test execution itself, except by inference as
above.
When building an APK, bazel's times were quite competitive, faster than Gradle, and comparable
to Buck. It's only in the test scenario.

Feature requests: what underlying problem are you trying to solve with this feature?

Implement persistent worker support for the TestRunner, to avoid the setup/teardown costs associated with invoking a new TestRunner.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Make a project, make a bunch of tests. I don't have a repro setup at present, but will be working one up.

What operating system are you running Bazel on?

CentOS and MacOS (different numbers, same proportions)

What's the output of `bazel info release`?

INFO: Invocation ID: 329bc936-43a5-48ba-b3b3-17ea3f158122
release 0.22.0

Have you found anything relevant by searching the web?

In looking, there have been discussions of experimental persistent worker supporting TestRunner, but the code seems to have been deleted, and none of the instructions work anymore.

Any other information, logs, or outputs that you want to share?

Example (redacted) test run iwth 16 tests.

internal:AllTest          PASSED in 2.7s
internal:01Test    PASSED in 0.8s
internal:02Test  PASSED in 1.4s
internal:03Test PASSED in 1.9s
internal:04Test     PASSED in 1.4s
internal:05Test   PASSED in 1.3s
internal:06Test PASSED in 1.0s
internal:07Test PASSED in 1.7s
internal:08Test          PASSED in 0.9s
internal:09Test PASSED in 1.9s
internal:10Test PASSED in 1.7s
internal:11Test PASSED in 1.5s
internal:12Test PASSED in 1.0s
internal:13Test           PASSED in 1.4s
internal:14Test PASSED in 1.9s
internal:15Test PASSED in 0.8s
internal:16Test      PASSED in 1.6s

Buck equivalent (didn't run the AllTest in this codebase)

[2019-02-26 02:29:04] PASS    <100ms  3 Passed   0 Skipped   0 Failed   01Test
[2019-02-26 02:29:04] PASS    <100ms  8 Passed   0 Skipped   0 Failed   02Test
[2019-02-26 02:29:04] PASS    <100ms  7 Passed   0 Skipped   0 Failed   03Test
[2019-02-26 02:29:04] PASS    <100ms 12 Passed   0 Skipped   0 Failed   04Test
[2019-02-26 02:29:04] PASS    <100ms 16 Passed   0 Skipped   0 Failed   05Test
[2019-02-26 02:29:04] PASS    <100ms 10 Passed   0 Skipped   0 Failed   06Test
[2019-02-26 02:29:04] PASS     109ms  9 Passed   0 Skipped   0 Failed   07Test
[2019-02-26 02:29:04] PASS    <100ms 14 Passed   0 Skipped   0 Failed   08Test
[2019-02-26 02:29:04] PASS    <100ms 20 Passed   0 Skipped   0 Failed   09Test
[2019-02-26 02:29:04] PASS    <100ms 10 Passed   0 Skipped   0 Failed   10Test
[2019-02-26 02:29:04] PASS    <100ms  2 Passed   0 Skipped   0 Failed   11Test
[2019-02-26 02:29:04] PASS    <100ms  9 Passed   0 Skipped   0 Failed   12Test
[2019-02-26 02:29:04] PASS     120ms  9 Passed   0 Skipped   0 Failed   13Test
[2019-02-26 02:29:04] PASS    <100ms 18 Passed   0 Skipped   0 Failed  14Test
[2019-02-26 02:29:04] PASS    <100ms  2 Passed   0 Skipped   0 Failed   15Test
[2019-02-26 02:29:04] PASS     148ms 30 Passed   0 Skipped   0 Failed   16Test

The text was updated successfully, but these errors were encountered:

benjaminp · 2019-03-01T16:48:03Z

Interestingly, there was an implementation of this but then it was deleted (0c9f2d4).

irengrig · 2019-03-01T17:10:22Z

/cc @meisterT

cgruber · 2019-03-01T23:06:37Z

Worth noting, one mitigation is auto-generation of per-package or per-top-level-package "AllTest" classes that contain @suite stuff, and replacing any other java_test statements. That definitely reduces test execution times, but at the cost of smooshing together all the dependencies and removing any ability to do "affected test" filtering.

jmmv · 2019-03-14T09:41:08Z

@meisterT Hmm, I was about to move this to team-Performance but you have just done the opposite. I think I heard from you that test setup is one of the major penalties we have now, performance-wise? Also, while this sounds like a "local execution request", is it really? I mean, is there something to change in the worker code in Bazel or what we really have to do is modify the workers themselves to support this request? Or maybe I don't understand the goal of team-Performance properly...

cgruber · 2019-05-29T18:54:12Z

Pinging this again. Now that we have thousands of tests in our corpus, doing a per-package aggregate generated test suite (so as to run all junit test cases in one single test target) has shaved about 1/4-1/2 of our build time off, just by itself. Not having some way to avoid the extra tax of non-persistent worker invocation is a pretty big deal, when you don't have a build farm.

cgruber · 2019-05-29T19:05:11Z

To give it some meat, an example run on a commit from yesterday (doing full builds, not "affected test" query magic) did this:

	aggregate	individual
count	396	1826
nocache	01:02:32	02:00:12 (timeout)
cache	49:03	01:49:04

Now, times vary a lot, and we're working to reduce the deviation, but this is representative of build times, on these machines.

jin · 2019-05-29T20:20:07Z

Pinging @lberki, author of 0c9f2d4 - do you have more context / background on why the experiment didn't work out?

lberki · 2019-06-04T10:56:28Z

Huh, that was a while ago... more context is on (internal) b/25006099, but the short version is that it proved to be difficult to be both correct enough and fast enough. The following issues come to mind:

There were classes that are are common dependencies of the test runner and the code under test (think Guava) so separating them was difficult
It proved to be hard to "purge" the state of the JVM from previous test runs (although we could probably have tried harder with classloading magic)
A lot of time is spent in JIT compiling the code under test, which you can't avoid on each test run if you want to be correct

If I had to try again, I would try either jury-rigging something with native-image or CRIU so that the JVM startup and the test runner is AOT compiled, then dynamically load the actual code under test (handwave-handwave). That way, we'd get easily provable correctness without incurring (most of the) overhead.

That wouldn't help with JIT compiling potentially large amounts of code under test, though.

benjaminp · 2020-01-14T16:41:12Z

From 02f8da9, it seems this idea has made a comeback?

meisterT · 2020-01-14T16:51:31Z

Indeed, Irina is giving it another try.

laurentlb · 2020-03-01T14:14:46Z

What's the status of this?

iirina · 2020-03-01T16:58:00Z

There's a working version in blaze, but not in bazel.

TheImplementer · 2020-03-27T10:57:18Z

@iirina any chance this could be ported to bazel as well?

cgruber · 2020-05-28T23:12:52Z

Didn't this just get released as an experimental flag?

nkoroste · 2021-04-20T18:51:37Z

Any updates on this?
As an experiment I just migrated our code base to have 1 test class per target instead of 1 large target with multiple test classes - this results in significantly slower overall build times.

11:07:20 18:07:19  //components/screen/chat:test                                  �[0m�[32mPASSED�[0m in 112.6s

vs

11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/ChatFragTest �[0m�[32mPASSED�[0m in 87.3s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/CreateReporterTest �[0m�[32mPASSED�[0m in 91.8s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/lenses/SuggestedTest �[0m�[32mPASSED�[0m in 72.0s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/messagelist/MessagePresenterTest �[0m�[32mPASSED�[0m in 104.8s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/sticker/ActiveTest �[0m�[32mPASSED�[0m in 73.7s

meisterT · 2021-04-21T08:44:05Z

We currently don't have the bandwidth to work on this. If you want to contribute, feel free to improve --experimental_persistent_test_runner.

Nick: what's the slow part of your tests?

nkoroste · 2021-04-21T16:55:51Z

I need more time to answer that question with certainty. Since I'm testing this in Android here are the three theories:

Each target has to build resources if resources have changed
Robolectric pays a cost of byte code instrumentation of the Android jars for each per JVM test invocation.
JVM/JIT overhead

The question is which of these 3 amounts for the most time?

nkoroste · 2021-04-21T17:04:17Z

Actually, do you know the difference between the following 2 outputs?

//components/screen/chat:src/test/java/com/foo/messaging/chat/features/CreateReporterTest [0m[32mPASSED[0m in 91.8s

vs

11:08:21 18:08:21  ==================== Test output for //components/screen/chat:src/test/java/com/foo/messaging/chat/features/CreateReporterTest:
11:08:21 18:08:21  JUnit4 Test Runner
11:08:21 18:08:21  .....
11:08:21 18:08:21  Time: 7.656

I'm not sure if I'm reading this right but I believe the former is total test time for the target that includes build time? while the latter is pure test execution time?

cgruber · 2021-09-28T00:41:57Z

Happy to look at improving this - right now, we're getting stdout from the worker that is breaking the API contract of workers, but it seems empty. Gonna look at the code for this, and see if I can redirect output into the worker protocol message.

If anyone has any pointers for code entry points, who could help me get started, that would be awesome. If not I'll figure it out.

twheys · 2022-01-25T13:00:36Z

Hi - just curious, what's the status of this issue? I am having issues with the flag --experimental_persistent_test_runner due to some output which I presume is being written to stdout by some Bazel class. Any ideas where this output comes from and if it's possible to disable it? I also tried implementing my own PersistentWorker test runner class, but same issue. Seems like this might be happening from one of the shell wrapper scripts or somewhere before java gets invoked.

---8<---8<--- Start of response ---8<---8<---
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //zoom/test/java/com/r9/zoom/nearby/processing:java_test
rocessing:java_test---8<---8<--- End of response ---8<---8<---

Update: I found this by searching the bazel source code:

bazel/tools/test/test-setup.sh

Line 27 in 09c621e

echo 'exec ${PAGER:-/usr/bin/less} "$0" || exit 1'

Seems like there is a --no_echo arg for this script that used to get set by WorkerTestStrategy

bazel/src/main/java/com/google/devtools/build/lib/worker/WorkerTestStrategy.java

Line 201 in 25120df

startupArgs.add(args.get(0)).add("--no_echo");

From googling, it appears that that class was rolled back at some point.

ittaiz · 2022-07-10T11:00:51Z

This isn't progressing anywhere, right?
Taking a look at the local feedback loop when working on bazel and I have a hunch these costs are significant

Silic0nS0ldier · 2022-08-16T05:33:10Z

Looks like the remaining experiment source has been removed in v6. 1f33504

ittaiz · 2022-08-16T15:14:23Z

@meisterT can you share why it was removed?
And maybe more importantly if there is willingness to have this capability be contributed to bazel?

meisterT · 2022-08-29T12:48:50Z

(Sorry for the late response, I was on a longer vacation and am still working through my pile of email. )

The implementation was mostly unused and it was buggy (not always reloading changed jars correctly IIRC). The internal test that we had became flaky and we didn't have the bandwidth to properly fix the functionality, so we decided to delete the experiment.

It is always available in version control if someone wants to spend time and energy to work on it (we don't in the foreseeable future).
If you (or someone else) wants to bring this back, we need to make sure it reloads classes correctly when it should.

ittaiz · 2022-08-29T17:44:37Z

Thanks for the additional feedback!
I've taken a turn away from bazel and towards "fast test" in IntelliJ but I'm not loving this direction.
I might take us this issue if the "fast test" direction falls through.
Thanks!

github-actions · 2023-11-03T01:26:24Z

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

github-actions · 2024-02-01T01:27:56Z

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please post @bazelbuild/triage in a comment here and we'll take a look. Thanks!

irengrig added team-Performance Issues for Performance teams untriaged labels Mar 1, 2019

meisterT added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Performance Issues for Performance teams labels Mar 6, 2019

jmmv added the more data needed label Mar 14, 2019

susinmotion added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed more data needed untriaged labels Sep 25, 2019

meisterT assigned iirina Jan 14, 2020

laurentlb unassigned iirina Mar 1, 2020

alanfalloon mentioned this issue Nov 12, 2020

Add features to workers to allow asset management (licenses, hardware, etc) #12467

Open

swarren12 mentioned this issue Dec 6, 2022

java_test: make it work for multiple junit tests and non tests sources without Skylark wrapper and dependent library #2539

Open

Jonpez2 mentioned this issue Jul 7, 2023

Excessive output in tests #18859

Open

github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Nov 3, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2024

swarren12 mentioned this issue Feb 13, 2024

Improve performance when running a large number of JUnit tests bazelbuild/rules_java#172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent test execution worker (TestRunner supporting worker strategy) #7595

Persistent test execution worker (TestRunner supporting worker strategy) #7595

cgruber commented Mar 1, 2019

benjaminp commented Mar 1, 2019

irengrig commented Mar 1, 2019

cgruber commented Mar 1, 2019

jmmv commented Mar 14, 2019

cgruber commented May 29, 2019 •

edited

Loading

cgruber commented May 29, 2019

jin commented May 29, 2019

lberki commented Jun 4, 2019

benjaminp commented Jan 14, 2020

meisterT commented Jan 14, 2020 •

edited

Loading

laurentlb commented Mar 1, 2020

iirina commented Mar 1, 2020

TheImplementer commented Mar 27, 2020

cgruber commented May 28, 2020

nkoroste commented Apr 20, 2021

meisterT commented Apr 21, 2021

nkoroste commented Apr 21, 2021 •

edited

Loading

nkoroste commented Apr 21, 2021

cgruber commented Sep 28, 2021

twheys commented Jan 25, 2022 •

edited

Loading

ittaiz commented Jul 10, 2022

Silic0nS0ldier commented Aug 16, 2022

ittaiz commented Aug 16, 2022

meisterT commented Aug 29, 2022

ittaiz commented Aug 29, 2022

github-actions bot commented Nov 3, 2023

github-actions bot commented Feb 1, 2024

Persistent test execution worker (TestRunner supporting worker strategy) #7595

Persistent test execution worker (TestRunner supporting worker strategy) #7595

Comments

cgruber commented Mar 1, 2019

Description of the problem / feature request:

Feature requests: what underlying problem are you trying to solve with this feature?

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

benjaminp commented Mar 1, 2019

irengrig commented Mar 1, 2019

cgruber commented Mar 1, 2019

jmmv commented Mar 14, 2019

cgruber commented May 29, 2019 • edited Loading

cgruber commented May 29, 2019

jin commented May 29, 2019

lberki commented Jun 4, 2019

benjaminp commented Jan 14, 2020

meisterT commented Jan 14, 2020 • edited Loading

laurentlb commented Mar 1, 2020

iirina commented Mar 1, 2020

TheImplementer commented Mar 27, 2020

cgruber commented May 28, 2020

nkoroste commented Apr 20, 2021

meisterT commented Apr 21, 2021

nkoroste commented Apr 21, 2021 • edited Loading

nkoroste commented Apr 21, 2021

cgruber commented Sep 28, 2021

twheys commented Jan 25, 2022 • edited Loading

ittaiz commented Jul 10, 2022

Silic0nS0ldier commented Aug 16, 2022

ittaiz commented Aug 16, 2022

meisterT commented Aug 29, 2022

ittaiz commented Aug 29, 2022

github-actions bot commented Nov 3, 2023

github-actions bot commented Feb 1, 2024

What's the output of `bazel info release`?

cgruber commented May 29, 2019 •

edited

Loading

meisterT commented Jan 14, 2020 •

edited

Loading

nkoroste commented Apr 21, 2021 •

edited

Loading

twheys commented Jan 25, 2022 •

edited

Loading