-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent test execution worker (TestRunner supporting worker strategy) #7595
Comments
Interestingly, there was an implementation of this but then it was deleted (0c9f2d4). |
/cc @meisterT |
Worth noting, one mitigation is auto-generation of per-package or per-top-level-package "AllTest" classes that contain @suite stuff, and replacing any other java_test statements. That definitely reduces test execution times, but at the cost of smooshing together all the dependencies and removing any ability to do "affected test" filtering. |
@meisterT Hmm, I was about to move this to |
Pinging this again. Now that we have thousands of tests in our corpus, doing a per-package aggregate generated test suite (so as to run all junit test cases in one single test target) has shaved about 1/4-1/2 of our build time off, just by itself. Not having some way to avoid the extra tax of non-persistent worker invocation is a pretty big deal, when you don't have a build farm. |
To give it some meat, an example run on a commit from yesterday (doing full builds, not "affected test" query magic) did this:
Now, times vary a lot, and we're working to reduce the deviation, but this is representative of build times, on these machines. |
Huh, that was a while ago... more context is on (internal) b/25006099, but the short version is that it proved to be difficult to be both correct enough and fast enough. The following issues come to mind:
If I had to try again, I would try either jury-rigging something with That wouldn't help with JIT compiling potentially large amounts of code under test, though. |
From 02f8da9, it seems this idea has made a comeback? |
Indeed, Irina is giving it another try. |
What's the status of this? |
There's a working version in blaze, but not in bazel. |
@iirina any chance this could be ported to bazel as well? |
Didn't this just get released as an experimental flag? |
Any updates on this?
vs
|
We currently don't have the bandwidth to work on this. If you want to contribute, feel free to improve Nick: what's the slow part of your tests? |
I need more time to answer that question with certainty. Since I'm testing this in Android here are the three theories:
The question is which of these 3 amounts for the most time? |
Actually, do you know the difference between the following 2 outputs?
vs
I'm not sure if I'm reading this right but I believe the former is total test time for the target that includes build time? while the latter is pure test execution time? |
Happy to look at improving this - right now, we're getting stdout from the worker that is breaking the API contract of workers, but it seems empty. Gonna look at the code for this, and see if I can redirect output into the worker protocol message. If anyone has any pointers for code entry points, who could help me get started, that would be awesome. If not I'll figure it out. |
Hi - just curious, what's the status of this issue? I am having issues with the flag
Update: I found this by searching the bazel source code: bazel/tools/test/test-setup.sh Line 27 in 09c621e
Seems like there is a --no_echo arg for this script that used to get set by WorkerTestStrategy bazel/src/main/java/com/google/devtools/build/lib/worker/WorkerTestStrategy.java Line 201 in 25120df
From googling, it appears that that class was rolled back at some point. |
This isn't progressing anywhere, right? |
Looks like the remaining experiment source has been removed in v6. 1f33504 |
@meisterT can you share why it was removed? |
(Sorry for the late response, I was on a longer vacation and am still working through my pile of email. ) The implementation was mostly unused and it was buggy (not always reloading changed jars correctly IIRC). The internal test that we had became flaky and we didn't have the bandwidth to properly fix the functionality, so we decided to delete the experiment. It is always available in version control if someone wants to spend time and energy to work on it (we don't in the foreseeable future). |
Thanks for the additional feedback! |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team ( |
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please post |
Description of the problem / feature request:
In our bazel conversion experiment we have moved a small subset of apps, libraries, and tests over to bazel. In running head-to-head in CI, we get nearly 2x wall time on bazel. Some of that we can optimize away, however we end up with 73 of 75 tests running, on BUCK, in under 100ms, whereas bazel runs them in 1-3 seconds each. This leads to 3 minutes of Buck, 3 minutes of Gradle, and 6 minutes of Bazel executing the same build set.
While we have not fully tweaked all the optimization settings (some bits of the build we need make use workers, and sandboxing is expensive) our investigation accounted for that - the biggest cost is in test execution taking around 10x per test on Bazel vs. Gradle or Buck in our situation.
A few relevant pieces of the puzzle include:
the suite runs all hte test code (confirmed by forcing failures first throughout the tests), and it
runs in vastly less time than the sum of all the executions. In an example run, AllTest executed
16 tests' worth of code in 2.7s, while all 16 individually took between .8 and 1.9 seconds.
This "AllTest" suite only incurs the penalty of test runner setup once, for N tests, and that made
a huge difference.
but in a clean build these times were closer to 0.9-3.2secs, and when all 75 tests are run, the
tests got slower individually, by a minimum of .4 seconds, and the whole build lengthened
considerably disproportionately. Somewhat this is limited by the number of worker processes,
but neither Buck nor Gradle hit this, and their test times were still sub-second for nearly all
of them.
determine how much time is being taken by the test execution itself, except by inference as
above.
to Buck. It's only in the test scenario.
Feature requests: what underlying problem are you trying to solve with this feature?
Implement persistent worker support for the TestRunner, to avoid the setup/teardown costs associated with invoking a new TestRunner.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Make a project, make a bunch of tests. I don't have a repro setup at present, but will be working one up.
What operating system are you running Bazel on?
CentOS and MacOS (different numbers, same proportions)
What's the output of
bazel info release
?INFO: Invocation ID: 329bc936-43a5-48ba-b3b3-17ea3f158122
release 0.22.0
Have you found anything relevant by searching the web?
In looking, there have been discussions of experimental persistent worker supporting TestRunner, but the code seems to have been deleted, and none of the instructions work anymore.
Any other information, logs, or outputs that you want to share?
Example (redacted) test run iwth 16 tests.
Buck equivalent (didn't run the AllTest in this codebase)
The text was updated successfully, but these errors were encountered: