Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent test execution worker (TestRunner supporting worker strategy) #7595

Closed
cgruber opened this issue Mar 1, 2019 · 27 comments
Closed
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request

Comments

@cgruber
Copy link
Contributor

cgruber commented Mar 1, 2019

Description of the problem / feature request:

In our bazel conversion experiment we have moved a small subset of apps, libraries, and tests over to bazel. In running head-to-head in CI, we get nearly 2x wall time on bazel. Some of that we can optimize away, however we end up with 73 of 75 tests running, on BUCK, in under 100ms, whereas bazel runs them in 1-3 seconds each. This leads to 3 minutes of Buck, 3 minutes of Gradle, and 6 minutes of Bazel executing the same build set.

While we have not fully tweaked all the optimization settings (some bits of the build we need make use workers, and sandboxing is expensive) our investigation accounted for that - the biggest cost is in test execution taking around 10x per test on Bazel vs. Gradle or Buck in our situation.

A few relevant pieces of the puzzle include:

  • When we create an "AllTest" suite (using @RunWith(Suite.class) and @Suite.SuiteClasses({...}),
    the suite runs all hte test code (confirmed by forcing failures first throughout the tests), and it
    runs in vastly less time than the sum of all the executions. In an example run, AllTest executed
    16 tests' worth of code in 2.7s, while all 16 individually took between .8 and 1.9 seconds.
    This "AllTest" suite only incurs the penalty of test runner setup once, for N tests, and that made
    a huge difference.
  • These tests were parallelized, and the 2.7 seconds above was close to the in the small build,
    but in a clean build these times were closer to 0.9-3.2secs, and when all 75 tests are run, the
    tests got slower individually, by a minimum of .4 seconds, and the whole build lengthened
    considerably disproportionately. Somewhat this is limited by the number of worker processes,
    but neither Buck nor Gradle hit this, and their test times were still sub-second for nearly all
    of them.
  • In the trace profiles, test execution is not further broken down, so there's little ability to
    determine how much time is being taken by the test execution itself, except by inference as
    above.
  • When building an APK, bazel's times were quite competitive, faster than Gradle, and comparable
    to Buck. It's only in the test scenario.

Feature requests: what underlying problem are you trying to solve with this feature?

Implement persistent worker support for the TestRunner, to avoid the setup/teardown costs associated with invoking a new TestRunner.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Make a project, make a bunch of tests. I don't have a repro setup at present, but will be working one up.

What operating system are you running Bazel on?

CentOS and MacOS (different numbers, same proportions)

What's the output of bazel info release?

INFO: Invocation ID: 329bc936-43a5-48ba-b3b3-17ea3f158122
release 0.22.0

Have you found anything relevant by searching the web?

In looking, there have been discussions of experimental persistent worker supporting TestRunner, but the code seems to have been deleted, and none of the instructions work anymore.

Any other information, logs, or outputs that you want to share?

Example (redacted) test run iwth 16 tests.

internal:AllTest          PASSED in 2.7s
internal:01Test    PASSED in 0.8s
internal:02Test  PASSED in 1.4s
internal:03Test PASSED in 1.9s
internal:04Test     PASSED in 1.4s
internal:05Test   PASSED in 1.3s
internal:06Test PASSED in 1.0s
internal:07Test PASSED in 1.7s
internal:08Test          PASSED in 0.9s
internal:09Test PASSED in 1.9s
internal:10Test PASSED in 1.7s
internal:11Test PASSED in 1.5s
internal:12Test PASSED in 1.0s
internal:13Test           PASSED in 1.4s
internal:14Test PASSED in 1.9s
internal:15Test PASSED in 0.8s
internal:16Test      PASSED in 1.6s

Buck equivalent (didn't run the AllTest in this codebase)

[2019-02-26 02:29:04] PASS    <100ms  3 Passed   0 Skipped   0 Failed   01Test
[2019-02-26 02:29:04] PASS    <100ms  8 Passed   0 Skipped   0 Failed   02Test
[2019-02-26 02:29:04] PASS    <100ms  7 Passed   0 Skipped   0 Failed   03Test
[2019-02-26 02:29:04] PASS    <100ms 12 Passed   0 Skipped   0 Failed   04Test
[2019-02-26 02:29:04] PASS    <100ms 16 Passed   0 Skipped   0 Failed   05Test
[2019-02-26 02:29:04] PASS    <100ms 10 Passed   0 Skipped   0 Failed   06Test
[2019-02-26 02:29:04] PASS     109ms  9 Passed   0 Skipped   0 Failed   07Test
[2019-02-26 02:29:04] PASS    <100ms 14 Passed   0 Skipped   0 Failed   08Test
[2019-02-26 02:29:04] PASS    <100ms 20 Passed   0 Skipped   0 Failed   09Test
[2019-02-26 02:29:04] PASS    <100ms 10 Passed   0 Skipped   0 Failed   10Test
[2019-02-26 02:29:04] PASS    <100ms  2 Passed   0 Skipped   0 Failed   11Test
[2019-02-26 02:29:04] PASS    <100ms  9 Passed   0 Skipped   0 Failed   12Test
[2019-02-26 02:29:04] PASS     120ms  9 Passed   0 Skipped   0 Failed   13Test
[2019-02-26 02:29:04] PASS    <100ms 18 Passed   0 Skipped   0 Failed  14Test
[2019-02-26 02:29:04] PASS    <100ms  2 Passed   0 Skipped   0 Failed   15Test
[2019-02-26 02:29:04] PASS     148ms 30 Passed   0 Skipped   0 Failed   16Test
@benjaminp
Copy link
Collaborator

Interestingly, there was an implementation of this but then it was deleted (0c9f2d4).

@irengrig
Copy link
Contributor

irengrig commented Mar 1, 2019

/cc @meisterT

@irengrig irengrig added team-Performance Issues for Performance teams untriaged labels Mar 1, 2019
@cgruber
Copy link
Contributor Author

cgruber commented Mar 1, 2019

Worth noting, one mitigation is auto-generation of per-package or per-top-level-package "AllTest" classes that contain @suite stuff, and replacing any other java_test statements. That definitely reduces test execution times, but at the cost of smooshing together all the dependencies and removing any ability to do "affected test" filtering.

@meisterT meisterT added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Performance Issues for Performance teams labels Mar 6, 2019
@jmmv
Copy link
Contributor

jmmv commented Mar 14, 2019

@meisterT Hmm, I was about to move this to team-Performance but you have just done the opposite. I think I heard from you that test setup is one of the major penalties we have now, performance-wise? Also, while this sounds like a "local execution request", is it really? I mean, is there something to change in the worker code in Bazel or what we really have to do is modify the workers themselves to support this request? Or maybe I don't understand the goal of team-Performance properly...

@cgruber
Copy link
Contributor Author

cgruber commented May 29, 2019

Pinging this again. Now that we have thousands of tests in our corpus, doing a per-package aggregate generated test suite (so as to run all junit test cases in one single test target) has shaved about 1/4-1/2 of our build time off, just by itself. Not having some way to avoid the extra tax of non-persistent worker invocation is a pretty big deal, when you don't have a build farm.

@cgruber
Copy link
Contributor Author

cgruber commented May 29, 2019

To give it some meat, an example run on a commit from yesterday (doing full builds, not "affected test" query magic) did this:

aggregate individual
count 396 1826
nocache 01:02:32 02:00:12 (timeout)
cache 49:03 01:49:04

Now, times vary a lot, and we're working to reduce the deviation, but this is representative of build times, on these machines.

@jin
Copy link
Member

jin commented May 29, 2019

Pinging @lberki, author of 0c9f2d4 - do you have more context / background on why the experiment didn't work out?

@lberki
Copy link
Contributor

lberki commented Jun 4, 2019

Huh, that was a while ago... more context is on (internal) b/25006099, but the short version is that it proved to be difficult to be both correct enough and fast enough. The following issues come to mind:

  1. There were classes that are are common dependencies of the test runner and the code under test (think Guava) so separating them was difficult
  2. It proved to be hard to "purge" the state of the JVM from previous test runs (although we could probably have tried harder with classloading magic)
  3. A lot of time is spent in JIT compiling the code under test, which you can't avoid on each test run if you want to be correct

If I had to try again, I would try either jury-rigging something with native-image or CRIU so that the JVM startup and the test runner is AOT compiled, then dynamically load the actual code under test (handwave-handwave). That way, we'd get easily provable correctness without incurring (most of the) overhead.

That wouldn't help with JIT compiling potentially large amounts of code under test, though.

@susinmotion susinmotion added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed more data needed untriaged labels Sep 25, 2019
@benjaminp
Copy link
Collaborator

From 02f8da9, it seems this idea has made a comeback?

@meisterT
Copy link
Member

meisterT commented Jan 14, 2020

Indeed, Irina is giving it another try.

@laurentlb
Copy link
Contributor

What's the status of this?

@iirina
Copy link
Contributor

iirina commented Mar 1, 2020

There's a working version in blaze, but not in bazel.

@TheImplementer
Copy link

@iirina any chance this could be ported to bazel as well?

@cgruber
Copy link
Contributor Author

cgruber commented May 28, 2020

Didn't this just get released as an experimental flag?

@nkoroste
Copy link
Contributor

Any updates on this?
As an experiment I just migrated our code base to have 1 test class per target instead of 1 large target with multiple test classes - this results in significantly slower overall build times.

11:07:20 18:07:19  //components/screen/chat:test                                  �[0m�[32mPASSED�[0m in 112.6s

vs

11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/ChatFragTest �[0m�[32mPASSED�[0m in 87.3s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/CreateReporterTest �[0m�[32mPASSED�[0m in 91.8s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/lenses/SuggestedTest �[0m�[32mPASSED�[0m in 72.0s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/messagelist/MessagePresenterTest �[0m�[32mPASSED�[0m in 104.8s
11:15:12 18:15:12  //components/screen/chat:src/test/java/com/foo/messaging/chat/features/sticker/ActiveTest �[0m�[32mPASSED�[0m in 73.7s

@meisterT
Copy link
Member

We currently don't have the bandwidth to work on this. If you want to contribute, feel free to improve --experimental_persistent_test_runner.

Nick: what's the slow part of your tests?

@nkoroste
Copy link
Contributor

nkoroste commented Apr 21, 2021

I need more time to answer that question with certainty. Since I'm testing this in Android here are the three theories:

  1. Each target has to build resources if resources have changed
  2. Robolectric pays a cost of byte code instrumentation of the Android jars for each per JVM test invocation.
  3. JVM/JIT overhead

The question is which of these 3 amounts for the most time?

@nkoroste
Copy link
Contributor

Actually, do you know the difference between the following 2 outputs?

//components/screen/chat:src/test/java/com/foo/messaging/chat/features/CreateReporterTest [0m[32mPASSED[0m in 91.8s

vs

11:08:21 18:08:21  ==================== Test output for //components/screen/chat:src/test/java/com/foo/messaging/chat/features/CreateReporterTest:
11:08:21 18:08:21  JUnit4 Test Runner
11:08:21 18:08:21  .....
11:08:21 18:08:21  Time: 7.656

I'm not sure if I'm reading this right but I believe the former is total test time for the target that includes build time? while the latter is pure test execution time?

@cgruber
Copy link
Contributor Author

cgruber commented Sep 28, 2021

Happy to look at improving this - right now, we're getting stdout from the worker that is breaking the API contract of workers, but it seems empty. Gonna look at the code for this, and see if I can redirect output into the worker protocol message.

If anyone has any pointers for code entry points, who could help me get started, that would be awesome. If not I'll figure it out.

@twheys
Copy link

twheys commented Jan 25, 2022

Hi - just curious, what's the status of this issue? I am having issues with the flag --experimental_persistent_test_runner due to some output which I presume is being written to stdout by some Bazel class. Any ideas where this output comes from and if it's possible to disable it? I also tried implementing my own PersistentWorker test runner class, but same issue. Seems like this might be happening from one of the shell wrapper scripts or somewhere before java gets invoked.

---8<---8<--- Start of response ---8<---8<---
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //zoom/test/java/com/r9/zoom/nearby/processing:java_test
rocessing:java_test---8<---8<--- End of response ---8<---8<---

Update: I found this by searching the bazel source code:

echo 'exec ${PAGER:-/usr/bin/less} "$0" || exit 1'

Seems like there is a --no_echo arg for this script that used to get set by WorkerTestStrategy
startupArgs.add(args.get(0)).add("--no_echo");

From googling, it appears that that class was rolled back at some point.

@ittaiz
Copy link
Member

ittaiz commented Jul 10, 2022

This isn't progressing anywhere, right?
Taking a look at the local feedback loop when working on bazel and I have a hunch these costs are significant

@Silic0nS0ldier
Copy link
Contributor

Looks like the remaining experiment source has been removed in v6. 1f33504

@ittaiz
Copy link
Member

ittaiz commented Aug 16, 2022

@meisterT can you share why it was removed?
And maybe more importantly if there is willingness to have this capability be contributed to bazel?

@meisterT
Copy link
Member

(Sorry for the late response, I was on a longer vacation and am still working through my pile of email. )

The implementation was mostly unused and it was buggy (not always reloading changed jars correctly IIRC). The internal test that we had became flaky and we didn't have the bandwidth to properly fix the functionality, so we decided to delete the experiment.

It is always available in version control if someone wants to spend time and energy to work on it (we don't in the foreseeable future).
If you (or someone else) wants to bring this back, we need to make sure it reloads classes correctly when it should.

@ittaiz
Copy link
Member

ittaiz commented Aug 29, 2022

Thanks for the additional feedback!
I've taken a turn away from bazel and towards "fast test" in IntelliJ but I'm not loving this direction.
I might take us this issue if the "fast test" direction falls through.
Thanks!

Copy link

github-actions bot commented Nov 3, 2023

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Nov 3, 2023
Copy link

github-actions bot commented Feb 1, 2024

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please post @bazelbuild/triage in a comment here and we'll take a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request
Projects
None yet
Development

No branches or pull requests