-
Notifications
You must be signed in to change notification settings - Fork 156
Description
opening this issue as recommended by @toote in #386 (comment)
some explanation about what is going on
First, when the job was cancelled as the agent running the job will send the whole process stack a stopping and then a termination signal.
the process group receives a SIGTERM immediately (SIGTSTP is never used):
https://github.com/buildkite/agent/blob/b9d4efd1/bootstrap/bootstrap.go#L118
https://github.com/buildkite/agent/blob/b9d4efd1/process/signal.go#L36
The main container should have received that and stopped itself then.
unfortunately, the process group that is signalled is bash, because that is what is running here
| run_docker_compose "${run_params[@]}" |
docker compose run ... process never receives any signal and never stops
$ bash
$ echo $$
33366
$ sleep 1000
$ pstree -p 33366
bash(33366)───sleep(34280)
$ kill 33366 # nothing happens
$ kill -- -33366 # nothing happens; this is equivalent to the above because sleep is in its own process group
$ kill -s KILL -- -33366 # bash exits
$ ps x | grep 34280
34280 pts/2 S 0:00 sleep 1000
you can see here that even though the bash process group was sent a SIGKILL, sleep is still running. you can see the process group behavior of bash with ps o pid,pgid,sid | grep 34280
so the agent is never able to stop the main/run container when using docker-compose-buildkite-plugin
If that was not enough, the code just above the lines you added should have also taken care of it by killing all containers associated to the project itself (the main container is part of the project as well).
this I don't have a good understanding of. I agree that that is what should be happening, but that is not what we are seeing, which is why running against my PR causes docker ps to output a container ID and why you see docker stop in the logs (in my PR's description)