Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.1.0] Make Bazel more responsive and use less memory when --jobs is high #17398

Merged
merged 2 commits into from
Feb 7, 2023

Commits on Feb 2, 2023

  1. Make Bazel more responsive and use less memory when --jobs is high

    When using Bazel in combination with a larger remote execution cluster, it's not uncommon to call it with something like --jobs=512. We have observed that this is currently problematic for a couple of reasons:
    
    1. It causes Bazel to launch 512 local threads, each being responsible for running one action remotely. All of these local threads may spend a lot of time in buildRemoteAction(), generating input roots in the form of Merkle trees.
    
       As the local system tends to have fewer than 512 CPUs, all of these threads will unnecessarily compete with each other. One practical downside of that is that interrupting Bazel using ^C takes a very long time, as it first wants to complete the computation of all 512 Merkle trees.
    
       Let's put a semaphore in place, limiting the number of concurrent Merkle tree computations to the number of CPU cores available. By making the semaphore interruptible, Bazel will immediately stop processing the `512 - nCPU` actions that were waiting for the semaphore.
    
    2. Related to the above, Bazel will end up keeping 512 Merkle trees in memory throughout all stages of execution. This makes sense, as we may get cache misses, requiring us to upload the input root afterwards. Or the execution of a remote action may fail, requiring us to upload the input root.
    
       That said, generally speaking these cases are fairly uncommon. Most builds have relatively high cache hit rates and execution retries only happen rarely. It is therefore not worth keeping these Merkle trees in memory constantly. We only need it when computing the action digest for GetActionResult(), and while uploading it into the CAS.
    
    3. AbstractSpawnStrategy.getInputMapping() has some smartness to memoize its results. This makes a lot of sense for local execution, where the input mapping is used in a couple of places. For remote caching/execution it is not evident that this is a good idea. Assuming you end up having a remote cache hit, you don't need it.
    
       Let's make the memoization optional, only using it in cases where we do local execution (which may also happen when you get a cache miss when doing remote caching without remote exection).
    
    Similar changes against Bazel 5.x have allowed me to successfully do builds of a large monorepo using --jobs=512 using the default heap size limits, whereas I would normally see occasional OOM behaviour when providing --host_jvm_args=-Xmx64g.
    
    Closes bazelbuild#17120.
    
    PiperOrigin-RevId: 500990181
    Change-Id: I6d1ba03470b79424ce2e1c2e83abd8fa779dd268
    EdSchouten authored and coeuvre committed Feb 2, 2023
    Configuration menu
    Copy the full SHA
    d4542bc View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2023

  1. Configuration menu
    Copy the full SHA
    1022994 View commit details
    Browse the repository at this point in the history