Skip to content

Managing growth of the filepool for remote output service #32

@adam-azarchs

Description

@adam-azarchs

I'm not sure if this is a feature request or simply a request for more documentation if I'm just missing something in my configuration or just fundamentally misunderstanding how all of this works.

I've been trying to set up bb-clientd for a remote output service in a CI environment. This should be a huge benefit for us in terms of being able to share a local cache between multiple builds running concurrently. However I've run into an apparent blocker around the filepool. As far as I can tell, there isn't any kind of eviction policy for the filepool, nor is there any kind of deduplication between builds. So it grows unbounded until it gets full and stuff starts breaking.

As far as I can tell, here's what's going on:

  1. Every bazel build invocation gets its own virtual output directory in ~/bb_clientd/outputs.
  2. For the most part, that is getting populated with metadata entries pointing to digests for content from the remote executors, which winds up backed by the CAS so their entries are relatively small. This still eventually becomes a problem without any kind of cleanup of old output directories - even slow unbounded growth is still unbounded.
  3. Files created by local actions are written into the output tree directly and wind up backed by the filepool. At least in our build this is in the form of a lot of ctx.actions.symlink, which is mostly fine (same caveats as previous point), but also a lot of ctx.actions.write or ctx.actions.expand_template, which can be significantly less fine. There can be a lot of these, from for example the generated stubs for py_test targets.
  4. Most of the content in the filepool is identical from one build to the next, but there isn't any deduplication for it as far as I can tell.
  5. I haven't been able to find any configuration options which would cause old output trees to be cleaned up, so they continue to accumulate until the filepool fills up.
  6. Cleaning them up in a CI environment can be tricky. A build node may have multiple concurrent executors running (part of the point of bb_clientd here is to take advantage of sharing caches between these, after all) so it isn't easy to tell which output trees are still relevant. A build job can include a post-build cleanup step to delete the output tree it just created, but then an interrupted build could lead to leakage.
  7. I'm also not entirely sure whether deleting an output tree actually frees the space from the filepool; as far as I can tell it's using bitmapSectorAllocator, the commentary for which suggests it was designed to be an ephemeral storage arena for remote execution workers, which is not really all that close to the usage pattern we're talking about here.
  8. There isn't an easy way to monitor the utilization of the filepool to figure out if we're close to hitting the limit there. Gemini suggested the Prometheus metric bbclientd_filepool_used_bytes metric, which would be great if it weren't a hallucination.

I'm trying to wrap my head around how this is supposed to work; how are stale output trees supposed to be managed? I feel like there really does need to be some way to configure automatic cleanup of old output trees. Examples would include keeping the most recent N invocations worth or defining the limit in terms of bytes, or maybe in terms of age. Recency could be based on when the invocation started, when it was finalized, or maybe when that output tree was last accessed (which the daemon knows about because it is involved in any access to it). I don't want to bike-shed the precise details of which ones get dropped and when as long as there's some way to bound the growth.

Probably a separate feature entirely but it would also be neat if, when an output tree was finalized, file contents in the filepool got moved into a CAS. Probably a separate CAS from the one caching remote outputs, with separate eviction policies, since in this case it wouldn't just be a cache. I understand this would be tricky; you'd at least need to compute digests, copy the content from the filepool to the CAS file, and then atomically replace the metadata entries. At least if you're careful about how you do the copy, the go standard library will be using copy_file_range so if the underlying filesystem supports CoW reflinks (e.g. btrfs or xfs) then it won't have to copy any actual bytes until such time as that section of the filepool gets overwritten.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions