-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Nomad cgroup-v2 integration as it has some cgroupv1-isms. Cgroups-v2 changed the filesystem representation and changed the memory metrics that Nomad has relied on, so Nomad reports 0 memory summary metric across ~all drivers.
First, Nomad memory reporting relies on cgroup-v1 metrics. Nomad defaults to using RSS as the top line memory summary value to report, and reports Kernel Max Usage, Kernel Usage, Max Usage, RSS, none of which are reported in cgroupv2. You can view the libcontainer reporting difference by comparing cgroup v1 memory stats with cgroup v2. This is pretty confusing.
Also, the executor DestroyCgroup method uses libcontainer cgroup v1 . This needs to be updated to account for v2 and ideally select the relevant cgroup backend.
It's not clear what the state of cgroup-v2 adoption is. Seems like Fedora and ArchLinux. Other distros, like RHEL and Ubuntu, provide it as an option but the default one.
Sample metrics of cgroup v2
Running on Fedora 33, I see the following stats info:
= 1e2bdcc2-983d-1e0c-d226-95577bffc188
Eval ID = dae9b0ab-d31a-446b-a9df-5f2cbf37dc53
Name = memory.cache[0]
Node ID = f7bf24d9-d3c0-c34e-0b80-1c6a5de7eddf
Node Name = ip-172-31-74-56.ec2.internal
Job ID = memory
Job Version = 1
Client Status = running
Client Description = Tasks are running
Desired Status = run
Desired Description = <none>
Created = 2021-03-28T17:52:15-04:00
Modified = 2021-03-28T17:52:33-04:00
Deployment ID = ff079acc-8f67-41bb-dc67-e5c506e9a795
Deployment Health = healthy
Evaluated Nodes = 1
Filtered Nodes = 0
Exhausted Nodes = 0
Allocation Time = 88.646µs
Failures = 0
Task "redis" is "running"
Task Resources
CPU Memory Disk Addresses
2465/500 MHz 0 B/1000 MiB 300 MiB
Memory Stats
Cache Kernel Max Usage Kernel Usage Max Usage RSS Swap Usage
0 B 0 B 0 B 0 B 0 B 0 B 261 MiB
CPU Stats
Percent System Mode Throttled Periods Throttled Time User Mode
98.64% 0.00% 0 0 98.64%
Task Events:
Started At = 2021-03-28T21:52:22Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2021-03-28T17:52:22-04:00 Started Task started by client
2021-03-28T17:52:20-04:00 Task Setup Building Task Directory
2021-03-28T17:52:15-04:00 Received Task received by client
Placement Metrics
Node binpack job-anti-affinity node-affinity node-reschedule-penalty final score
f7bf24d9-d3c0-c34e-0b80-1c6a5de7eddf 0.635 0 0 0 0.635
Also, here is docker memory stats for cgroup v1 and v2
Cgroup v2
{
"usage": 2744320,
"stats": {
"active_anon": 1757184,
"active_file": 0,
"anon": 1622016,
"anon_thp": 0,
"file": 0,
"file_dirty": 0,
"file_mapped": 0,
"file_writeback": 0,
"inactive_anon": 0,
"inactive_file": 0,
"kernel_stack": 73728,
"pgactivate": 0,
"pgdeactivate": 0,
"pgfault": 3531,
"pglazyfree": 0,
"pglazyfreed": 0,
"pgmajfault": 0,
"pgrefill": 0,
"pgscan": 0,
"pgsteal": 0,
"shmem": 0,
"slab": 573440,
"slab_reclaimable": 0,
"slab_unreclaimable": 573440,
"sock": 0,
"thp_collapse_alloc": 0,
"thp_fault_alloc": 0,
"unevictable": 0,
"workingset_activate": 0,
"workingset_nodereclaim": 0,
"workingset_refault": 0
},
"limit": 2036068352
}Cgroup v1
{
"usage": 6778880,
"max_usage": 9478144,
"stats": {
"active_anon": 1622016,
"active_file": 2297856,
"cache": 4055040,
"dirty": 0,
"hierarchical_memory_limit": 9223372036854772000,
"hierarchical_memsw_limit": 0,
"inactive_anon": 0,
"inactive_file": 1757184,
"mapped_file": 2027520,
"pgfault": 5049,
"pgmajfault": 33,
"pgpgin": 5016,
"pgpgout": 3591,
"rss": 1626112,
"rss_huge": 0,
"total_active_anon": 1622016,
"total_active_file": 2297856,
"total_cache": 4055040,
"total_dirty": 0,
"total_inactive_anon": 0,
"total_inactive_file": 1757184,
"total_mapped_file": 2027520,
"total_pgfault": 5049,
"total_pgmajfault": 33,
"total_pgpgin": 5016,
"total_pgpgout": 3591,
"total_rss": 1626112,
"total_rss_huge": 0,
"total_unevictable": 0,
"total_writeback": 0,
"unevictable": 0,
"writeback": 0
},
"limit": 1026154496
}