Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted runner with Docker step creates files that trip up the checkout step #434

Open
j3parker opened this issue Apr 17, 2020 · 61 comments
Labels
question Further information is requested Runner Feature Feature scope to the runner

Comments

@j3parker
Copy link

j3parker commented Apr 17, 2020

Describe the bug
When using self-hosted runners, git checkouts are cached between runs (this is nice, because it greatly speeds up our builds).

However, if a docker-based step writes a file to the workspace it will (possibly) be owned by the root user. If the permissions don't give the (non-root) action runner user +w permission then a checkout step in a future workflow run will fail to remove this file. The first time, the error will look like this:

##[group]Cleaning the repository
[command]/usr/bin/git clean -ffdx
warning: could not open directory 'foo/': Permission denied
warning: failed to remove foo/: Directory not empty
##[endgroup]
##[warning]Unable to clean or reset the repository. The repository will be recreated instead.
Deleting the contents of '/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro'
##[error]Command failed: rm -rf "/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro/foo"
rm: cannot remove '/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro/foo': Permission denied

So git clean -ffdx tried to stat() this foo/ directory (created via a container in a previous build) but failed. It was then unable to remove the directory because it wasn't empty. It tried to fall back to rm -rf which failed for the same reasons.

In future builds it goes straight to rm -rf because the .git folder did get cleaned up. It continues to fail in the same way for all future builds. Here's a screenshot:

image

To Reproduce

I've created a repo that reproduces the error: https://github.com/Brightspace/self-hosted-runner-permissions-issue-repro

Here's an example of a workflow failing: https://github.com/Brightspace/self-hosted-runner-permissions-issue-repro/runs/596011452?check_suite_focus=true

Expected behavior

I guess I'd expect all the files to be owned by the runner user... in a perfect world. Maybe that can be done with user namespace maps? Documentation. Not sure what that would entail though or if it makes sense for what the runner is doing.

I think this is not an issue with the checkout action because I don't think there is anything they could do about it - it'd impact other actions too, checkout was just the first one I hit the issue with.

Runner Version and Platform

Ubuntu 18.04, runner version 2.168.0

These are org-level runners but I imagine it's not specific to that.

@j3parker j3parker added the bug Something isn't working label Apr 17, 2020
@TingluoHuang
Copy link
Member

@j3parker We should document this, basically, if you are using any container feature of GitHub Actions (container step or job container) you probably should run the runner as ROOT to avoid any potential file permission issue caused by files created by the container.

@chrispat thoughts?

@karancode
Copy link

karancode commented Apr 21, 2020

@j3parker thanks for reporting this. I am also facing the same issue.
@TingluoHuang I thought to start the runner as root but the utility run.sh has a check to not start runner as root. https://github.com/actions/runner/blob/master/src/Misc/layoutroot/run.sh#L3

# Validate not sudo
user_id=`id -u`
if [ $user_id -eq 0 -a -z "$RUNNER_ALLOW_RUNASROOT" ]; then
    echo "Must not run interactively with sudo"
    exit 1
fi

EDIT : @j3parker I think you can simply achieve this by exporting the variable RUNNER_ALLOW_RUNASROOT=1. Check here.
Thank you @TingluoHuang

@chrispat
Copy link
Member

If you are going to use container based actions I would recommend you use a job container as well. Mixing container and host environments does not work very well. The other option is to add a step to change permissions of files after the container action runs.

@bryanmacfarlane
Copy link
Member

Also, I don't think we can say must run runner as root since many folks run as systemd service and I don't think that will allow us to run as root.

@bryanmacfarlane
Copy link
Member

bryanmacfarlane commented Apr 21, 2020

@karancode the reason that's an envvar is there's many scenarios (service) where it won't work as the service ends up running as a user (the user configured or one specified). but there are some scenarios (like putting the runner in a container) where you want / need to explicitly run as root and that's why the override is there. We should formalize the run as root more and doc it better.

We need to keep this open and think a bit more on what the right solution is. If you want to do run as root for now and you don't run the runner as a systemd service, then make sure however you launch it, it calls runsvc.sh and not run.sh so it doesn't exit on updates.

@karancode
Copy link

karancode commented Apr 21, 2020

@chrispat @bryanmacfarlane Thanks.
In my case, I am running runner inside containers, so I had no other option but to run it as root.. (I tried to change permissions of files after container actions but that way isn't feasible as there could be many different actions generating multiple files.. I also tried running it as a systemd service but it was just uglier).

I also faced the issue with run.sh exiting on updates(observed with minor version update but not with patch).
Thanks for the tip, I will check for runsvc.sh and how to use it.
PS : If there's any doc please share, would love you contrib. Thanks!

@j3parker
Copy link
Author

j3parker commented Apr 21, 2020

If you are going to use container based actions I would recommend you use a job container as well. Mixing container and host environments does not work very well.

Cool - when you say job containers are you referring to this? I hadn't seen that before... I'll definitely try that!

We need to keep this open and think a bit more on what the right solution is.

Thanks! It looks like container: will mitigate this for us for now. If the runner itself messed around with user namespaces that might be able to solve this (they can be nested) but it might be a bunch of work...

If container: works for us we'd be interested in being able to configure our runners to only accept containerized jobs. I can open a feature request for this after testing it out.

@TingluoHuang TingluoHuang added runner question Further information is requested and removed bug Something isn't working runner labels Jun 8, 2020
@peter-murray
Copy link

Another option here is that when running inside a container that has the user as root, you can use the jobs.<jobid>.container.options directive to provide the --user uid:gid value of the user and group that the self-hosted action runner is running as.

The downside to this is that details of the actions runner environment is starting to leak into the workflows of your projects, which is less than ideal in larger companies.

@kencochrane
Copy link

This is hitting me as well, at first it was easier to avoid, but now that we have more and more actions it is getting harder.

Is there a way to have the runner clean up the work directory after the workflow is finished. If the runner isn’t running as root, then it probably can’t delete the directory. But it would be able to do that if it was running in a container.

Maybe there is a way, to turn on a clean up job that will run after the workflow is complete to delete the files. If that job is run in a container it should have access to delete everything.

This only works if you have Docker installed, but that is fine because I don’t think the issue happens without docker.

I guess I could add this to all of my workflows, but that doesn’t seem ideal, getting it added at the runner level makes things cleaner. Just an idea, not sure if it is a good one, but figured I would add it here and see what people think.

@jef
Copy link

jef commented Jul 28, 2020

For all my private actions, I ended up putting USER 1000:1000 in all the Docker actions. Since there is only ever root and the user I created, the only valid options are 0 and 1000. That way, any files created within the Docker action that are persisted by the self-hosted runner, they have the correct user.

The other solution is to just run the runner as root by setting RUNNER_ALLOW_RUNASROOT=1.

@merou
Copy link

merou commented Aug 3, 2020

in your workflow file use this, e.g:

  build:
    runs-on: self-hosted
    needs: [clone-repository]
    container:
      image: gradle:5.5.1-jdk11
      options: --user 1000
    ...
    ...

@jef
Copy link

jef commented Aug 3, 2020

I suppose that will only work if you're using container vs a Docker action.

I don't believe an action has those concepts unless specifically written by the action. In which case, it would need to get updated in every action that uses Docker (extreme example). Unless there is a universal environment variable that masks files or sets file creation permissions. (I suppose I'm thinking something similar to UMASK here; not sure really.)

@merou
Copy link

merou commented Aug 4, 2020

Yes for Docker actions (private or public) - User will need to modify the corresponding Dockerfile by indeed having USER 1000:1000 like you mentioned earlier.

The nice thing is that github actions are indirectly enforcing good practices - i.e do not run a Container as root.
I personally think that it is the user responsibility to set the right permissions there.

Concerning container to avoid the options: --user 1000 repetition, it would be nice to be able to define it like this:

defaults:
  runs-on: self-hosted
  container-user-id: 1000

@xanantis
Copy link

xanantis commented Aug 8, 2020

I found this temporary workaround for Docker action: Fix for self-hosted runner

P.S: I updated script and moved it to repository README.md

@clementohNZ
Copy link

You might not be able to run with sudo, but you can add user to the root group. That worked for me :)

sudo usermod -a -G root <USER_NAME>

@jef
Copy link

jef commented Aug 12, 2020

You might not be able to run with sudo, but you can add user to the root group. That worked for me :)

sudo usermod -a -G root <USER_NAME>

Strange... I did this before and still gave me problems. I can try again in the future.

Thanks!

@xanantis
Copy link

@jef You may try my solution. It is hacky, but it does work for me.

@xanantis
Copy link

@jef I rewrote my script. Now it is possible to set docker's user option via env vars. You can find more details here: https://github.com/xanantis/docker-file-ownership-fix#to-fix-all-those-above-for-self-hosted-runner

@imurashka
Copy link

Hey! I faced with the same proble. Have you guys found easy solution?

@malkam03
Copy link

Hey! I faced with the same proble. Have you guys found easy solution?

This worked for me...

I had to remove manually the conflicted files, and then adding the user ID of the user we use for automation did the trick.

@peter-murray
Copy link

peter-murray commented Dec 2, 2020

Setting the uid is not always possible depending upon the container, how it was built and internal permissions inside the container. It is a good approach most of the time, but there are edge cases and in some use cases where there are separate teams managing the runners and the workflows complications can arise.

I have created an action that can "reset" permissions on the workspace directories and files that would trip up a consecutive run and break at checkout; https://github.com/peter-murray/reset-workspace-ownership-action

It is still not a perfect solution can can be appended to the end of the workflow with minimal impact and overhead to the workflow until there is a longer term fix available.

@MuchToKnow
Copy link

Also, I don't think we can say must run runner as root since many folks run as systemd service and I don't think that will allow us to run as root.

Ran into this myself today due to running python in docker containers during a test step, creating root-owned pycache files. These files then break the next build when the runner attempts removal during the checkout step like OP.

Are there any issues with running the service as root utilizing the [user] param for install? I'm hosting a runner on ubuntu 20.04.1 and running:

sudo ./svc.sh install root
sudo ./svc.sh start

works well for me as a workaround for now.

@adityasharad
Copy link

+1. This has also come up where users are using github/codeql-action (for Code Scanning) or other Actions that write to runner.temp. In that case it's possible for the Action to write data to the temp directory that fails to be cleaned up at the start of a next run, because the container user doesn't have permission to delete it. Documenting the right practice of getting the users to match would be a good way to help identify and prevent this.

ghost pushed a commit to RevolutionPi/revpi-modbus that referenced this issue May 9, 2022
Uses docker that is pulled from container registry. Due to issue
actions/runner#434 in github actions
there is no clean sleeve at the start of the run, actions do not
use temporary workspaces but employ existing file systems on the
runner's system instead. This can lead to undefined behavior if
no cleanup is executed before checkout. Also mind that there
will be issues with the file system's permission if the docker
container's user does not match the UID of the action runner's UID.
ivan4th added a commit to travelping/upg-vpp that referenced this issue May 20, 2022
sergeymatov pushed a commit to travelping/upg-vpp that referenced this issue May 20, 2022
@dayananda30
Copy link

sudo usermod -a -G root <USER_NAME>

Thank you :-) This worked for me.

@cjproud
Copy link

cjproud commented May 26, 2023

Are people still having this issue? I've been able to fix my issue by doing what was suggested in #434 (comment) but it feels a bit gross to run the service as root.

My issue is that after running a black formatting action (https://github.com/psf/black) there are files leftover in the _actions dir that aren't owned by the runner's user but rather by root:

Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/lib/python3.10/site-packages/black-23.3.0.dist-info/INSTALLER' is denied.) (Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/pyvenv.cfg' is denied.) 

(Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/lib64' is denied.) (Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/lib/python3.10/site-packages/mypy_extensions.py' is denied.)

this causes subsequent actions to fail with the above message.

@sammcj
Copy link

sammcj commented May 26, 2023

Yep, still happens with a few different clients of mine.

@Kurt-von-Laven
Copy link

Instead of running the service as root, you can use ScribeMD/rootless-docker to run Docker in rootless mode.

@marc-hb
Copy link

marc-hb commented Jul 16, 2023

Not 100% sure this is relevant sorry but to work around what I think is a similar problem, I wrote a small script that dynamically creates a user inside the container with the ID that matches the user ID outside: https://stackoverflow.com/a/74330112

In that case the problem solved is: sharing inside files outside.

It's ugly but surprisingly effective at solving what seems to be a docker design issue.

bl4ko pushed a commit to bl4ko/homepage that referenced this issue Aug 3, 2023
bl4ko pushed a commit to bl4ko/homepage that referenced this issue Aug 5, 2023
@cjproud
Copy link

cjproud commented Aug 9, 2023

If anyone is having the problem I mentioned above with black, I've actually had a PR just merged that will remove all files created by black during the action 👍

@brian-kiplagat
Copy link

image
facing this issue with a deployment workflow

@brian-kiplagat
Copy link

image facing this issue with a deployment workflow

im on centos 7 as root user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Runner Feature Feature scope to the runner
Projects
None yet
Development

No branches or pull requests