Skip to content

RFE: concurrency >1 for run_command_cleaner cleanup #163

@masterlongshanks

Description

@masterlongshanks

This is a Request for Enhancement: allow the post-build-cleanup hook to support concurrency of greater than one.

Having the worker and runner as separate users to prevent local cache poisoning by the runner is the recommended approach.
Having the worker run as the privileged user (ie, root) is also recommended: partly to protect the cache, and partly to remove obstacles to cleanup. Perhaps additional motivations also apply.

In some corporate environments, running workers as the privileged user (ie, root) is difficult going on impossible.

We have hit cleanup permissions issues [ failed readiness check...permission denied ], and attempted to address that by arranging the worker and runner to be in the same unix groupd and using the cleanup hook to call a script that adds group rwx to all directories, and rw to all files, owned by the runner. The intention is that this clears the way for the worker cleanup routines. We recognise this a stretch given the documented purpose of the hook is to wipe external artifacts, not build dirs.

However, there are three problems:

  1. We understand that concurrency needs to be set to one for this to be effective. We have 32core VMs, 31 of which are reserved for Buildbarn via cgroups. We need concurrency to be 31.
  2. We have not been able to work out how to pass a particular action dir to the chmod script, so it simply hits all of build/. Clearly this is not smart: some of the action dirs are still in flight.
  3. We hit deadlock from time to time, when the runner loses read-execute permission to the action directory (build/$HASH), so the cleanup hook cannot run. Doesn't matter how clever the script called by the hook is, if it has not access.

[ 3. Appears to be caused by customers running gradle within bazel. We are educating our customers, but... ]

Allowing the post-build-cleanup hook to support concurrency of greater than one would fix 1, and by deduction the implementation would fix 2.

How does that sound?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions