[6.1.0] Make Bazel more responsive and use less memory when --jobs is high #17398

When using Bazel in combination with a larger remote execution cluster, it's not uncommon to call it with something like --jobs=512. We have observed that this is currently problematic for a couple of reasons: 1. It causes Bazel to launch 512 local threads, each being responsible for running one action remotely. All of these local threads may spend a lot of time in buildRemoteAction(), generating input roots in the form of Merkle trees. As the local system tends to have fewer than 512 CPUs, all of these threads will unnecessarily compete with each other. One practical downside of that is that interrupting Bazel using ^C takes a very long time, as it first wants to complete the computation of all 512 Merkle trees. Let's put a semaphore in place, limiting the number of concurrent Merkle tree computations to the number of CPU cores available. By making the semaphore interruptible, Bazel will immediately stop processing the `512 - nCPU` actions that were waiting for the semaphore. 2. Related to the above, Bazel will end up keeping 512 Merkle trees in memory throughout all stages of execution. This makes sense, as we may get cache misses, requiring us to upload the input root afterwards. Or the execution of a remote action may fail, requiring us to upload the input root. That said, generally speaking these cases are fairly uncommon. Most builds have relatively high cache hit rates and execution retries only happen rarely. It is therefore not worth keeping these Merkle trees in memory constantly. We only need it when computing the action digest for GetActionResult(), and while uploading it into the CAS. 3. AbstractSpawnStrategy.getInputMapping() has some smartness to memoize its results. This makes a lot of sense for local execution, where the input mapping is used in a couple of places. For remote caching/execution it is not evident that this is a good idea. Assuming you end up having a remote cache hit, you don't need it. Let's make the memoization optional, only using it in cases where we do local execution (which may also happen when you get a cache miss when doing remote caching without remote exection). Similar changes against Bazel 5.x have allowed me to successfully do builds of a large monorepo using --jobs=512 using the default heap size limits, whereas I would normally see occasional OOM behaviour when providing --host_jvm_args=-Xmx64g. Closes bazelbuild#17120. PiperOrigin-RevId: 500990181 Change-Id: I6d1ba03470b79424ce2e1c2e83abd8fa779dd268

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6.1.0] Make Bazel more responsive and use less memory when --jobs is high #17398

[6.1.0] Make Bazel more responsive and use less memory when --jobs is high #17398

Commits on Feb 2, 2023

Commits on Feb 7, 2023