Skip to content

Speed up the gradle test invocation #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

andreaTP
Copy link
Contributor

@andreaTP andreaTP commented Nov 5, 2021

This is the aggregated result of digging into:
exercism/exercism#6097

Feel free to ask me to split this into multiple PRs if you are uncomfortable merging such a big change, I will do it tomorrow.

I'll comment inline the motivation for the various changes.

@andreaTP andreaTP requested a review from a team as a code owner November 5, 2021 17:35
@iHiD iHiD requested a review from ErikSchierboom November 5, 2021 17:42
@@ -1,6 +1,6 @@
# === Build builder image ===

FROM gradle:6.8.3-jdk11 AS build
FROM gradle:7.1-jdk11 AS build
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align version with the version used in the exercises

@@ -30,4 +30,9 @@ COPY --from=build /home/builder/autotest-runner.jar ./
# Copy cached dependencies
COPY --from=build /home/gradle /home/gradle

RUN pkill -f '.*GradleDaemon.*'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one stopped gradle daemon in the image, those two lines remove it (and the time spent to look it up)

RUN pkill -f '.*GradleDaemon.*'
RUN rm -rf ~/.gradle/daemon

ENV GRADLE_OPTS="--add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xms256m -Xmx2048M -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one core change, basically, the --no-daemon option passed from the command line was ineffective causing the startup of a daemon every execution (it was dominating the used time).
I struggled with this for a long and the only "solution/workaround" I've found is this:
gradle/gradle#2660 (comment)

This is ugly and weak (especially maintenance-wise) but is the only thing that actually works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, my investigations pointed out that no daemon was used, but I didn't know how to fix it. Great work!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreaTP It looks like in the above changes you're killing all Gradle daemons, right? If so, would a running Gradle daemon that can be re-used when running the tests be faster than --no-daemon? In some tracks, what I did is to build a "dummy" project to pre-load/cache things. I have no idea if that makes sense here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I did is to build a "dummy" project to pre-load/cache things

do you mind elaborating or providing a link? might be interesting to investigate further this path!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! For Purescript, we have a regular exercise https://github.com/exercism/purescript-test-runner/tree/main/pre-compiled that we copy into the Docker image and then build/compile https://github.com/exercism/purescript-test-runner/blob/main/Dockerfile#L14. Doing a build/compile forces the Docker container to at least have loaded all the infrastructure once. Whether that is beneficial depends on the runtime that is used.

I tried using this approach with the Java test runner too, as I hoped that being able to re-use an existing Daemon that had already done some work might allow things to be sped up later. I couldn't get the gradle test command to re-use the started Gradle daemon, but I think you could with the GRADLE_OPTS specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't get the gradle test command to re-use the started Gradle daemon, but I think you could with the GRADLE_OPTS specified.

correct but it doesn't seem to produce great results ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that could be because there actually isn't any Gradle daemon already running, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

bin/run.sh Outdated
@@ -22,5 +22,7 @@ cp -R $input_folder/* .

find . -mindepth 1 -type f | grep 'Test.java' | xargs -I file sed -i "s/@Ignore(.*)//g;s/@Ignore//g;" file

java -jar /opt/test-runner/autotest-runner.jar $problem_slug
echo -e "test {\n maxHeapSize = '2G'\n reports.html.required = false\n}" >> build.gradle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another important change in this PR, disabling unused reporting as mentioned here:
https://docs.gradle.org/current/userguide/performance.html#report_generation

and increasing the heap for the tests makes them much faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -22,5 +22,7 @@ cp -R $input_folder/* .

find . -mindepth 1 -type f | grep 'Test.java' | xargs -I file sed -i "s/@Ignore(.*)//g;s/@Ignore//g;" file

java -jar /opt/test-runner/autotest-runner.jar $problem_slug
echo -e "test {\n maxHeapSize = '2G'\n reports.html.required = false\n}" >> build.gradle
timeout 20 gradle test --offline --no-daemon --warning-mode=none --no-watch-fs --console=plain 2> gradle-test.err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the external process execution from Java to the command line for a couple of reasons:

  • faster iteration cycles (no need to recompile to change options passed to gradle)
  • easier to change
  • log handling for debug purposes

I think that I prefer this here, but I can revert this change if required, not fussy about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those are all good arguments in favor of running this in bash.

java -jar /opt/test-runner/autotest-runner.jar $problem_slug
echo -e "test {\n maxHeapSize = '2G'\n reports.html.required = false\n}" >> build.gradle
timeout 20 gradle test --offline --no-daemon --warning-mode=none --no-watch-fs --console=plain 2> gradle-test.err
java -jar /opt/test-runner/autotest-runner.jar $?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slug was unused (a leftover?) and here we are passing the return code of the previous command to re-use the error handling logic.

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 5, 2021

The observed speedup on my local machine running in docker is about:

  • before ~19-20 seconds
  • after ~11-12 seconds

I would be happy if someone from the maintainers can confirm or deny those numbers (and maybe test in an environment similar to prod).

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 5, 2021

For the records, I have quickly attempted to use GraalVM native-image ( here ) but realized the implementation is reflection heavy.

We need either some sort of tests or integration tests to run this.

@ErikSchierboom
Copy link
Member

Thanks a ton work for working on this! I'll have a look next week, I'm terribly busy at the moment.

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 5, 2021

@ErikSchierboom enjoy the weekend! 🙂

@SleeplessByte
Copy link
Member

Wow. 1/3 speed-up already is really great @andreaTP!!

@iHiD iHiD requested review from a team and jmrunkle November 5, 2021 20:38
@iHiD
Copy link
Member

iHiD commented Nov 5, 2021

This is great :)

For the records, I have quickly attempted to use GraalVM native-image

For those of us not au fait, would you mind explaining a little more about what this would do pls?

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 5, 2021

Sure @iHiD !
Graal is an umbrella project from Oracle, one aspect of it is native-image that enables producing binaries (as opposed to jars) for jvm projects, reducing a lot startup time.
The test runner has a lot in common with command line applications and it will probably benefits from running it as a binary.

Ref: https://www.graalvm.org/reference-manual/native-image/

@andreaTP andreaTP closed this Nov 5, 2021
@andreaTP andreaTP reopened this Nov 5, 2021
@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 5, 2021

@iHiD I hope this explains!

(Closed by mistake, sorry for the noise)

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 8, 2021

Quick update just to share the experience, (aka another failed experiment 🙂 ).

I thought that running the TestRunner within Gradle itself(as a plugin) would save the additional JVM startup time.
A quick experiment (available here) demonstrates that this is not the case (back to ~16-17 seconds), possibly due to Gradle plugins loading overhead.

@iHiD
Copy link
Member

iHiD commented Nov 8, 2021

The test runner has a lot in common with command line applications and it will probably benefits from running it as a binary.

Nice. That sounds useful indeed!

@ErikSchierboom
Copy link
Member

@andreaTP If I try to test this locally, I get the following error:

Exception in thread "main" java.nio.file.NoSuchFileException: build/test-results/test
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:432)
        at java.base/java.nio.file.Files.newDirectoryStream(Files.java:472)
        at com.google.common.io.MoreFiles.listFiles(MoreFiles.java:261)
        at com.exercism.runner.TestRunner.run(TestRunner.java:49)
        at com.exercism.runner.TestRunner.main(TestRunner.java:22)

Any ideas? How did you test on your machine?

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 9, 2021

@ErikSchierboom you are right indeed:
here is the fix: 1cd917c

I was unconsciously relying on cached results 🙁 still this PR trim down execution times around ~14-15 seconds per run.

@ErikSchierboom
Copy link
Member

@andreaTP Yes, that fixes it. I've tried to benchmark the differences but could not find any on my machine. Maybe I'm doing something wrong? I've tried to benchmark it by adding time before the docker run command in the run_in_docker.sh command and then running ./run_in_docker.sh anagram $PWD/exercise $PWD/exercise. Via this "benchmarking" I couldn't really find any difference between the current test runner and your PR. How did you test the performance?

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 9, 2021

Hey @ErikSchierboom , I was testing against bird-watcher the same way you are doing, will have another look later.

@andreaTP
Copy link
Contributor Author

andreaTP commented Nov 9, 2021

Thanks for having a second look @ErikSchierboom !
I figured out that local caches have been tripping me 🙁 , to measure something meaningful I need something like this:
https://github.com/andreaTP/java-test-runner/blob/06f0ad2ae6934765ef3d0b4fcfbda1cc94bcdf67/bin/run.sh#L25-L26
and with this, the performance is about the same as you correctly observed.

Takeaways:

  • the caches are massively influencing gradle performance
  • without caches differences in using the daemon or not are irrelevant

I will try to have another look but seems like you have already maxed the performance of this design ...
What to try next?

  • aggressive caching? (mounting external folders or in docker etc. etc.)
  • test using a different build tool (maven?)
  • increase the timeouts 😜

I'm sorry that those experiments have failed, I'm available to have a chat on Slack about possible different approaches.

@andreaTP andreaTP closed this Nov 9, 2021
@ErikSchierboom
Copy link
Member

I think discussing things here is probably best, as we then have a public record of the things we've looked into, which makes it easier for other people

aggressive caching? (mounting external folders or in docker etc. etc.)
What would be cached? Caching within Docker is perfectly fine, so if you have an idea what we could cache, let me know!

test using a different build tool (maven?)

I have no idea what the ramifications of that change would be, but if the test results are still the same, I'm fine with that. I'll leave it to @exercism/java to judge whether or not this is something to pursue.

increase the timeouts 😜
We can't really do that, as we're already at the maximum runtime we can allow on our infrastructure.

@andreaTP
Copy link
Contributor Author

I've investigated a bit more:

aggressive caching

  • attempted with --build-cache, --configuration-cache and more: it's not effective

Basically, the very first gradle invocation on a clean project takes 11-12 seconds(on my machine) to resolve the configuration (startup phase of gradle itself).

Given all those failed attempts I'm drafting a PR to switch to Maven, exercism's builds are simple and pretty much constrained, of course, there are trade-offs, but might be worth giving this approach a spin!

@iHiD
Copy link
Member

iHiD commented Nov 11, 2021

Nice. Thanks @andreaTP!

@ErikSchierboom
Copy link
Member

Great! It's definitely worth a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants