Skip to content

Memory stats and freezer management with cgroupv2 #10251

@notnoop

Description

@notnoop

Nomad cgroup-v2 integration as it has some cgroupv1-isms. Cgroups-v2 changed the filesystem representation and changed the memory metrics that Nomad has relied on, so Nomad reports 0 memory summary metric across ~all drivers.

First, Nomad memory reporting relies on cgroup-v1 metrics. Nomad defaults to using RSS as the top line memory summary value to report, and reports Kernel Max Usage, Kernel Usage, Max Usage, RSS, none of which are reported in cgroupv2. You can view the libcontainer reporting difference by comparing cgroup v1 memory stats with cgroup v2. This is pretty confusing.

Also, the executor DestroyCgroup method uses libcontainer cgroup v1 . This needs to be updated to account for v2 and ideally select the relevant cgroup backend.

It's not clear what the state of cgroup-v2 adoption is. Seems like Fedora and ArchLinux. Other distros, like RHEL and Ubuntu, provide it as an option but the default one.

Sample metrics of cgroup v2

Running on Fedora 33, I see the following stats info:

                  = 1e2bdcc2-983d-1e0c-d226-95577bffc188
Eval ID             = dae9b0ab-d31a-446b-a9df-5f2cbf37dc53
Name                = memory.cache[0]
Node ID             = f7bf24d9-d3c0-c34e-0b80-1c6a5de7eddf
Node Name           = ip-172-31-74-56.ec2.internal
Job ID              = memory
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2021-03-28T17:52:15-04:00
Modified            = 2021-03-28T17:52:33-04:00
Deployment ID       = ff079acc-8f67-41bb-dc67-e5c506e9a795
Deployment Health   = healthy
Evaluated Nodes     = 1
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 88.646µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU           Memory        Disk     Addresses
2465/500 MHz  0 B/1000 MiB  300 MiB

Memory Stats
Cache  Kernel Max Usage  Kernel Usage  Max Usage  RSS  Swap  Usage
0 B    0 B               0 B           0 B        0 B  0 B   261 MiB

CPU Stats
Percent  System Mode  Throttled Periods  Throttled Time  User Mode
98.64%   0.00%        0                  0               98.64%

Task Events:
Started At     = 2021-03-28T21:52:22Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2021-03-28T17:52:22-04:00  Started     Task started by client
2021-03-28T17:52:20-04:00  Task Setup  Building Task Directory
2021-03-28T17:52:15-04:00  Received    Task received by client

Placement Metrics
Node                                  binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
f7bf24d9-d3c0-c34e-0b80-1c6a5de7eddf  0.635    0                  0              0                        0.635

Also, here is docker memory stats for cgroup v1 and v2

Cgroup v2

{
  "usage": 2744320,
  "stats": {
    "active_anon": 1757184,
    "active_file": 0,
    "anon": 1622016,
    "anon_thp": 0,
    "file": 0,
    "file_dirty": 0,
    "file_mapped": 0,
    "file_writeback": 0,
    "inactive_anon": 0,
    "inactive_file": 0,
    "kernel_stack": 73728,
    "pgactivate": 0,
    "pgdeactivate": 0,
    "pgfault": 3531,
    "pglazyfree": 0,
    "pglazyfreed": 0,
    "pgmajfault": 0,
    "pgrefill": 0,
    "pgscan": 0,
    "pgsteal": 0,
    "shmem": 0,
    "slab": 573440,
    "slab_reclaimable": 0,
    "slab_unreclaimable": 573440,
    "sock": 0,
    "thp_collapse_alloc": 0,
    "thp_fault_alloc": 0,
    "unevictable": 0,
    "workingset_activate": 0,
    "workingset_nodereclaim": 0,
    "workingset_refault": 0
  },
  "limit": 2036068352
}

Cgroup v1

{
  "usage": 6778880,
  "max_usage": 9478144,
  "stats": {
    "active_anon": 1622016,
    "active_file": 2297856,
    "cache": 4055040,
    "dirty": 0,
    "hierarchical_memory_limit": 9223372036854772000,
    "hierarchical_memsw_limit": 0,
    "inactive_anon": 0,
    "inactive_file": 1757184,
    "mapped_file": 2027520,
    "pgfault": 5049,
    "pgmajfault": 33,
    "pgpgin": 5016,
    "pgpgout": 3591,
    "rss": 1626112,
    "rss_huge": 0,
    "total_active_anon": 1622016,
    "total_active_file": 2297856,
    "total_cache": 4055040,
    "total_dirty": 0,
    "total_inactive_anon": 0,
    "total_inactive_file": 1757184,
    "total_mapped_file": 2027520,
    "total_pgfault": 5049,
    "total_pgmajfault": 33,
    "total_pgpgin": 5016,
    "total_pgpgout": 3591,
    "total_rss": 1626112,
    "total_rss_huge": 0,
    "total_unevictable": 0,
    "total_writeback": 0,
    "unevictable": 0,
    "writeback": 0
  },
  "limit": 1026154496
}

Links

Metadata

Metadata

Assignees

Labels

stage/acceptedConfirmed, and intend to work on. No timeline committment though.theme/cgroupscgroups issuestype/bug

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions