-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Petition Gradle to support JVM-local configuration-cache #987
Comments
One of Gradle's best features is the build cache ( In 6.6, Gradle shipped configuration cache ( But the really tricky restriction, on both incidental complexity and performance, is that all tasks must be able to roundtrip their state losslessly through Java's semi-deprecated serialization system. For Spotless that would require something close to a rewrite, but we were able to hack around it with ~300 lines (#986). The hack is much faster and easier to adopt than Gradle's current approach, so we hope that they will switch. The advantage of serialization
The disadvantages of serialization
Sharing configuration cache between machines is going to be error-prone and labor-intensiveThere is a feature proposal (gradle/gradle#13510) to share configuration cache information between machines, same idea as remote build cache. Remember that to make that work for the remote build cache, you needed to annotate every input property with a
How is a future version of Gradle going to know which need to be rewritten machine to machine, versus which are intended to be left alone? Plus, who cares! Build cache is already effective at offloading work from developer machines to CI machines. The real gold of configuration cache is improving iteration times on a single dev machine (time-to-test), and for that environment you'd happily trade away relocatability to get the fastest possible iteration time. The fastest possible iteration time is not going to have a serialization roundtrip as one of the required steps. |
Configuration cache is faster and easier without serializationHere is an example: // user-facing
org.gradle.api.LiveCache<T> { get/set }
class MyServerConnection { ... } // implements .equals and .hashCode, but not serializable
class MyTask {
private final LiveCache<ServerConnection> serverConnection = createLive<>("serverConnection")
public void setConnection(ServerConnection connection) {
serverConnection.set(connection);
}
@Input
public ServerConnection getConnection() {
return serverConnection.get();
}
// field/getter/setter could be replaced with @Inject LiveCacheableProperty<ServerConnection> getConnection()
@TaskAction
public void performAction(InputChanges inputs) throws Exception {
serverConnection.get().doStuff()
}
}
// implementation
data class InternalCacheKey {
File projectDir
String taskPath
String propertyName
}
private static Map<InternalCacheKey, Object> daemonGlobalState = ...
class GradleInternalTaskImplementation implements Task {
LiveCache<T> createLive(String name) {
InternalCacheKey internalKey = new InternalCacheKey(projectDir, getPath(), name);
return new LiveCache<T>() {
@Override public void set(T value) {
daemonGlobalState.put(internalKey, value);
}
@Override public T get() {
Object value = daemonGlobalState.get(internalKey);
if (value == null) {
throw TriggerConfigurationError(); // we need to rerun configuration to populate our cache
} else {
return (T) value;
}
}
}
} We implemented the "Gradle side" of this as |
Hi Ned, Thanks for bringing this up. First, with regards to the configuration-cache serialization: it is not based on Java Serialization - we are not that crazy :) Gradle uses its own serialization mechanism and format for configuration cache entries, it is optimized for i/o, computation and space saving. There is some support for Java Serialization as a fallback. But it is not recommended relying on it and handcrafting serialization. I just tried to make this clearer in the docs in gradle/gradle#19005 We may implement an in-memory cache at some point, for example we’ve talked about pre-creating the task graph at the end of the build in anticipation of the next build. But we think it would still make copies of any mutable structures (for isolation) and would still be backed by a persistent copy (for daemon restarts). Isolating tasks enables running all of them in parallel safely. Using shared mutable state wouldn't work. We’d need some data that shows that the current approach actually has enough of a negative impact on build performance to do something about it (not saying it doesn’t, only that we don’t yet know whether it does or does not). In terms of relocatabilty, let’s see. We’ve talked about using the remote cache, but we have some work to do before that would be possible. It’s hard to see us doing this without adding some more constraints or asking you to be more explicit about relocation. |
If the cache is only shared between sequential builds, then it is not shared mutable state. Just mutable state. And that's a risky thing, but not so risky if you've opted-in to it. And if Gradle never builds that API, that's okay, because it's very easy to do in userspace. The only part we can't build in userspace is I think our biggest disagreement is around user story. IMO, the dominant user story is a serial chain of |
I meant state shared between tasks. |
I might be naive, but that seems like another "design to the wrong point" problem. The worrycase as I understand it is a poorly implemented userland plugin which:
I agree it's possible, but it doesn't seem like it's so common that it's worth designing to. Especially when the builds where correctness matters the most are CI builds where configuration-cache doesn't make much sense in the first place. Let's say that Gradle came with a built-in
With some kind of verification tool, the implementation is no longer responsible for verification, and you open up a lot of design space for the implementation. The current "verification" tool (roundtrip serialization baked into every execution) is trivially defeated with a static variable containing mutable state. In Spotless, we are doing this on purpose, but it's easy to do this on accident too. A serialization roundtrip won't be able to catch bugs of this type. You could add some classloader magic, but now plugin authors are going to be debugging not only magic serialization, but also magic classloading. You'll also be paying another performance penalty by thrashing the JIT. With each release of Gradle, I would hope that authoring builds will get easier, and that I will spend less time on the incidental complexity of the jobs I want Gradle to do. My concern with the configuration-cache is that there is a big set of stuff that is fundamentally not serializable, and any build with those things in it makes the entire build a second-class experience. Besides Spotless, I have updated four other public build plugins for configuration-cache, and more private ones:
In every case, I am forced to write quite a lot of boilerplate, and formerly straightforward things become quite complicated (e.g. reading project / environment variables). I would love a feature like "developer-cache", marketed as so:
A "developer-cache" design would be unwilling to incur any runtime costs on verification responsibilities, and it wouldn't spend developer time on relocatability. |
Have you thought to implement the "automatic mode" for deleting the configuration cache? When I tried this feature it was failing mainly in CI which is annoying. Is it not possible to change the implementation to be really compatible with the Gradle way? |
Have you looked at what kind of speedup/slowdown you get if you disable configuration cache on the CI machines? If the inbound CI queue to a single runner is even just a little bit heterogenous, you can easily spend more time writing
We tried this in #644, and it's pretty close to a rewrite for us, for this reason. I'd be okay to merge PRs in that direction, but it would be a lot of work, and at the end of all the work you'd be a lot slower than a within-daemon JVM-local cache. |
I haven't tried it personally, but looks like having it enabled is the direction that tools like Gradle Build Action are taking. |
We were seeing a lot of this message:
So we end up disabling the configuration cache for PLEASE DON'T COPY&PASTE THIS IF YOU DON'T KNOW WHAT IT DOES! tasks.withType<SpotlessTask>().configureEach {
notCompatibleWithConfigurationCache("https://github.com/diffplug/spotless/issues/987")
} |
I am just trying to make the Micronaut build compatible with the configuration cache and Spotless compatibility is the last missing piece. I had to disable the cache with the technique @BraisGabin demonstrated, but this is far from ideal, since basically any
So I would personally really appreciated if you adapted the plugin. |
PR's welcome :) |
To me this issue/idea has a flaw. The configuration cache is meant to be reused locally by any daemon. Gradle daemon can disappear at any time and having a JVM-only cached configuration is against the whole concept. IMO there is a design flaw in the spotless plugin. I tried to make it CC compatible but the code needs to be seriously re-architectured. Currently, spotless creates a lot of JVM-lazy (not Gradle-lazy) objects e.g. lambdas that define behaviours of concrete formatters. These lambdas are not serializable in any way and that's the problem. At least as I understand it. The way to move forward, as I see it, is to delay creation of those lambdas until the Gradle execution phase. Spotless configuration should only know which formatters to use (e.g. scala or kotlin) and where the formatter configuration file is located. It's not an easy-peasy change, as it requires quite some changes in the spotless codebase. |
I'm super bummed that serialization roundtrip is a base requirement for Gradle plugins. IMO it's a drag on plugin development and maintenance, as well as execution time. I really wish Gradle plugins had access to a JVM-local cache designed for serial development. Try something, run a test, try something, run a test; that's the loop that I want to optimize for. But Gradle is clearly heavily committed to this path at this point, and we finally have a reasonable path forward for supporting roundtrip serialization. If you'd like to help:
|
Please thumbs-up this issue if you'd like Gradle to build this. TLDR: Unlike remote build cache, configuration cache is never going to be relocatable. Because of that, it might as well be a JVM-local cache (within-daemon), which removes the performance and incidental-complexity penalties of a serialization roundtrip. Spotless #986 hacks a workaround to demonstrate this approach, example API in comment below. Gradle could support this natively so that configuration cache is faster to use and easier to develop plugins against.
The text was updated successfully, but these errors were encountered: