cgroups: make sure cgroup still exists after task restart #12875

shoenig · 2022-05-04T19:14:39Z

This PR modifies raw_exec and exec to ensure the cgroup for a task
they are driving still exists during a task restart. These drivers
have the same bug but with different root cause.

For raw_exec, we were removing the cgroup in 2 places - the cpuset
manager, and in the unix containment implementation (the thing that
uses freezer cgroup to clean house). During a task restart, the
containment would remove the cgroup, and when the task runner hooks
went to start again would block on waiting for the cgroup to exist,
which will never happen, because it gets created by the cpuset manager
which only runs as an alloc pre-start hook. The fix here is to simply
not delete the cgroup in the containment implementation; killing the
PIDs is enough. The removal happens in the cpuset manager later anyway.

For exec, it's the same idea, except DestroyTask is called on task
failure, which in turn calls into libcontainer, which in turn deletes
the cgroup. In this case we do not have control over the deletion of
the cgroup, so instead we hack the cgroup back into life after the
call to DestroyTask.

All of this only applies to cgroups v2.

Fixes #12863

No CL because cgroupsv2 hasn't shipped yet.

shoenig · 2022-05-04T19:15:44Z

repro example

job "restarts" {
  datacenters = ["dc1"]
  type = "service"

  group "g1" {
    restart {
      attempts = 5
      mode = "delay"
      delay = "1s"
      interval = "5s"
    }

    task "t1-raw_exec" {
      driver = "raw_exec"
      config {
	command = "/usr/bin/bash"
	args = ["-c", "sleep 10"]
      }
    }

    task "t2-exec" {
      driver = "exec"
      config {
	command = "/usr/bin/bash"
	args = ["-c", "sleep 10"]
      }
    }    
  }
}

with fix:

Task "t1-raw_exec" is "running"
Task Resources
CPU        Memory          Disk     Addresses
0/100 MHz  42 MiB/300 MiB  300 MiB  

Task Events:
Started At     = 2022-05-04T19:01:57Z
Finished At    = N/A
Total Restarts = 5
Last Restart   = 2022-05-04T14:01:57-05:00

Recent Events:
Time                       Type        Description
2022-05-04T14:01:57-05:00  Started     Task started by client
2022-05-04T14:01:57-05:00  Restarting  Task restarting in 1.144459209s
2022-05-04T14:01:57-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:47-05:00  Started     Task started by client
2022-05-04T14:01:47-05:00  Restarting  Task restarting in 1.08660356s
2022-05-04T14:01:47-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:37-05:00  Started     Task started by client
2022-05-04T14:01:37-05:00  Restarting  Task restarting in 1.23187761s
2022-05-04T14:01:37-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:27-05:00  Started     Task started by client

Task "t2-exec" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/100 MHz  224 KiB/300 MiB  300 MiB  

Task Events:
Started At     = 2022-05-04T19:01:50Z
Finished At    = N/A
Total Restarts = 4
Last Restart   = 2022-05-04T14:01:49-05:00

Recent Events:
Time                       Type        Description
2022-05-04T14:01:50-05:00  Started     Task started by client
2022-05-04T14:01:49-05:00  Restarting  Task restarting in 1.08660356s
2022-05-04T14:01:49-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:39-05:00  Started     Task started by client
2022-05-04T14:01:39-05:00  Restarting  Task restarting in 1.23187761s
2022-05-04T14:01:39-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:29-05:00  Started     Task started by client
2022-05-04T14:01:29-05:00  Restarting  Task restarting in 1.042526215s
2022-05-04T14:01:29-05:00  Terminated  Exit Code: 0
2022-05-04T14:01:19-05:00  Started     Task started by client

shoenig · 2022-05-05T14:50:45Z

Running relevant tests locally on a cgroups v2 machine

➜ mount -l | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

test-cgutil.log
test-exec.log
test-raw_exec.log
test-taskrunner.log

This PR modifies raw_exec and exec to ensure the cgroup for a task they are driving still exists during a task restart. These drivers have the same bug but with different root cause. For raw_exec, we were removing the cgroup in 2 places - the cpuset manager, and in the unix containment implementation (the thing that uses freezer cgroup to clean house). During a task restart, the containment would remove the cgroup, and when the task runner hooks went to start again would block on waiting for the cgroup to exist, which will never happen, because it gets created by the cpuset manager which only runs as an alloc pre-start hook. The fix here is to simply not delete the cgroup in the containment implementation; killing the PIDs is enough. The removal happens in the cpuset manager later anyway. For exec, it's the same idea, except DestroyTask is called on task failure, which in turn calls into libcontainer, which in turn deletes the cgroup. In this case we do not have control over the deletion of the cgroup, so instead we hack the cgroup back into life after the call to DestroyTask. All of this only applies to cgroups v2.

shoenig · 2022-05-05T14:53:44Z

Will follow up with CI changes to run on ubuntu-22.04 in GHA
actions/runner-images#5490

tgross

LGTM

Does the most recent commit cover #12877 as well do you think or is that going to need to be a separate investigation?

shoenig · 2022-05-05T15:54:21Z

#12877

Let's keep it open; so far I haven't been able to reproduce but my laptop is on an older kernel

github-actions · 2022-10-13T02:45:17Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

shoenig force-pushed the b-cgroupsv2-task-restarts branch from 3f0b766 to e5e3e99 Compare May 4, 2022 19:18

vercel bot deployed to Preview – nomad-storybook-and-ui May 4, 2022 19:20 View deployment

tgross mentioned this pull request May 5, 2022

nomad 1.3.0-rc.1 cgroupsv2 /dev/ strangeness #12877

Closed

shoenig force-pushed the b-cgroupsv2-task-restarts branch from e5e3e99 to 37ffd2f Compare May 5, 2022 14:51

shoenig requested review from tgross and schmichael May 5, 2022 14:53

vercel bot deployed to Preview – nomad-storybook-and-ui May 5, 2022 14:54 View deployment

tgross approved these changes May 5, 2022

View reviewed changes

shoenig merged commit 7c91ac0 into main May 5, 2022

shoenig deleted the b-cgroupsv2-task-restarts branch May 5, 2022 15:54

shoenig added the backport/1.3.x backport to 1.3.x release line label May 5, 2022

hc-github-team-nomad-core mentioned this pull request May 5, 2022

Backport of cgroups: make sure cgroup still exists after task restart into release/1.3.x #12893

Merged

github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroups: make sure cgroup still exists after task restart #12875

cgroups: make sure cgroup still exists after task restart #12875

shoenig commented May 4, 2022 •

edited

Loading

shoenig commented May 4, 2022

shoenig commented May 5, 2022

shoenig commented May 5, 2022

tgross left a comment

shoenig commented May 5, 2022

github-actions bot commented Oct 13, 2022

cgroups: make sure cgroup still exists after task restart #12875

cgroups: make sure cgroup still exists after task restart #12875

Conversation

shoenig commented May 4, 2022 • edited Loading

shoenig commented May 4, 2022

shoenig commented May 5, 2022

shoenig commented May 5, 2022

tgross left a comment

Choose a reason for hiding this comment

shoenig commented May 5, 2022

github-actions bot commented Oct 13, 2022

shoenig commented May 4, 2022 •

edited

Loading