-
Notifications
You must be signed in to change notification settings - Fork 17
Description
I'm not sure if this is a feature request or simply a request for more documentation if I'm just missing something in my configuration or just fundamentally misunderstanding how all of this works.
I've been trying to set up bb-clientd for a remote output service in a CI environment. This should be a huge benefit for us in terms of being able to share a local cache between multiple builds running concurrently. However I've run into an apparent blocker around the filepool. As far as I can tell, there isn't any kind of eviction policy for the filepool, nor is there any kind of deduplication between builds. So it grows unbounded until it gets full and stuff starts breaking.
As far as I can tell, here's what's going on:
- Every
bazel buildinvocation gets its own virtual output directory in~/bb_clientd/outputs. - For the most part, that is getting populated with metadata entries pointing to digests for content from the remote executors, which winds up backed by the CAS so their entries are relatively small. This still eventually becomes a problem without any kind of cleanup of old output directories - even slow unbounded growth is still unbounded.
- Files created by local actions are written into the output tree directly and wind up backed by the
filepool. At least in our build this is in the form of a lot ofctx.actions.symlink, which is mostly fine (same caveats as previous point), but also a lot ofctx.actions.writeorctx.actions.expand_template, which can be significantly less fine. There can be a lot of these, from for example the generated stubs forpy_testtargets. - Most of the content in the
filepoolis identical from one build to the next, but there isn't any deduplication for it as far as I can tell. - I haven't been able to find any configuration options which would cause old output trees to be cleaned up, so they continue to accumulate until the filepool fills up.
- Cleaning them up in a CI environment can be tricky. A build node may have multiple concurrent executors running (part of the point of
bb_clientdhere is to take advantage of sharing caches between these, after all) so it isn't easy to tell which output trees are still relevant. A build job can include a post-build cleanup step to delete the output tree it just created, but then an interrupted build could lead to leakage. - I'm also not entirely sure whether deleting an output tree actually frees the space from the filepool; as far as I can tell it's using
bitmapSectorAllocator, the commentary for which suggests it was designed to be an ephemeral storage arena for remote execution workers, which is not really all that close to the usage pattern we're talking about here. - There isn't an easy way to monitor the utilization of the
filepoolto figure out if we're close to hitting the limit there. Gemini suggested the Prometheus metricbbclientd_filepool_used_bytesmetric, which would be great if it weren't a hallucination.
I'm trying to wrap my head around how this is supposed to work; how are stale output trees supposed to be managed? I feel like there really does need to be some way to configure automatic cleanup of old output trees. Examples would include keeping the most recent N invocations worth or defining the limit in terms of bytes, or maybe in terms of age. Recency could be based on when the invocation started, when it was finalized, or maybe when that output tree was last accessed (which the daemon knows about because it is involved in any access to it). I don't want to bike-shed the precise details of which ones get dropped and when as long as there's some way to bound the growth.
Probably a separate feature entirely but it would also be neat if, when an output tree was finalized, file contents in the filepool got moved into a CAS. Probably a separate CAS from the one caching remote outputs, with separate eviction policies, since in this case it wouldn't just be a cache. I understand this would be tricky; you'd at least need to compute digests, copy the content from the filepool to the CAS file, and then atomically replace the metadata entries. At least if you're careful about how you do the copy, the go standard library will be using copy_file_range so if the underlying filesystem supports CoW reflinks (e.g. btrfs or xfs) then it won't have to copy any actual bytes until such time as that section of the filepool gets overwritten.