fix: cpu affinity #4815

ningmingxiao · 2025-07-16T07:42:24Z

Old kernels do that automatically, but new kernels remember the affinity that was set before the cgroup move due to

https://lore.kernel.org/lkml/20220922180041.1768141-1-longman@redhat.com

This is undesirable for containers, because they inherit the systemd affinity when the should really move to the container space cpus.

ref:#4041
this patch https://lore.kernel.org/lkml/20231003205735.2921964-1-longman@redhat.com/ doesn't merge in kernel but merged in redhat I want to find another way to fix it.

AkihiroSuda · 2025-07-16T08:56:08Z

What was the issue?
How to test this PR?

libcontainer/process_linux.go

rata

@AkihiroSuda +1, it would be great if we can have some test for this.

haircommander · 2025-07-16T17:15:35Z

hmm I wonder if we can gather the cpuset from the container spec rather than inheriting what runc is being called with...

ningmingxiao · 2025-07-17T02:03:22Z

I try to get it form cpuset but default is empty.

ningmingxiao · 2025-07-17T02:15:32Z

I add an if,it only happen on linux.

if runtime.GOOS == "linux" {
}

libcontainer/process_linux.go

libcontainer/utils/utils.go

rata

Thanks for the PR! I left several comments, but I guess we want also tests in tests/integration :)

libcontainer/utils/utils.go

libcontainer/utils/utils_test.go

rata · 2025-07-17T09:45:02Z

libcontainer/utils/utils.go

+			break
+		}
+		line := string(data)
+		if line[:3] != "cpu" {


I think the comment that all cpu* lines are at the beginning belongs here.

Why did you mark this as solved? Am I missing something?

rata · 2025-07-17T09:45:51Z

libcontainer/utils/utils.go

+		if line[:3] != "cpu" {
+			break
+		}
+		if '0' <= line[3] && line[3] <= '9' {


Here we need a commet explaining that there is a "cpu" line that we should ignore and only count cpu lines followed by a number.

rata · 2025-07-17T09:48:29Z

libcontainer/process_linux.go

+	if err := setAffinityAll(p.pid()); err != nil {
+		return err
+	}
 	// Set final CPU affinity right after the process is moved into container's cgroup.
 	if err := p.setFinalCPUAffinity(); err != nil {


Sorry, I ignore this, but why do we need to set the affinity to all cpus and then to what is in the config? I don't follow from the link in the description why this step is needed.

Why was this marked as solved? Did I miss something?

libcontainer/process_linux.go

rata · 2025-07-17T15:09:28Z

@ningmingxiao ping when this is ready for another round of reviews (and the tests are green, hopefully :))

AkihiroSuda · 2025-07-18T03:17:48Z

libcontainer/process_linux.go

+	cpuset := unix.CPUSet{}
+	for i := 0; i < int(cpus); i++ {
+		cpuset.Set(i)
+	}


Was this tested with nested containers?
cpus here can be different from the number of the available cpus

Indeed. I think the code in this PR will only work if you use lxcfs with nested containers (which nobody does with runc).

I suspect that we would instead need to parse /proc/self/cgroup and then look at the CPU set in /sys/fs/cgroup/cpuset.cpus.effective (but we would also need to check any parent cgroups if cpuset is not in cgroup.controllers).

Alternatively, would just passing nil as described in MarSik@e6ce3af just work?

I believe this is trying to take advantage of the EINVAL error fallback of __sched_setaffinity (which does reset the affnitiy back to the cpuset if there is no overlap between the cpuset and the requested affinity) but I'm not sure it actually works. My reading of __set_cpus_allowed_ptr gives me the impression that this shouldn't work, but the linked commit claims this resolves this issue?

Oh actually, sched_setaffinity silently clamps the cpuset you give based on the cpuset for the task. So this is fine.

However, I would like to know if nil works just as well -- less code is better.

In fact, it might be even simpler to just generate a set of 8192 CPUs and get the kernel to clamp it for us? The kernel automatically clamps the size of cpumask to nr_cpu_ids internally so even if you give a really large number they will happily ignore it.

EDIT: Testing this, it seems golang.org/x/sys/unix will silently truncate the cpuset to 1024 CPUs. They have a hardcoded limit of _CPU_SETSIZE.

AkihiroSuda · 2025-07-18T03:18:08Z

.github/workflows/test.yml


    - name: integration test (systemd driver)
      run: |
+        sudo taskset -pc 0-1 1


Should be moved to the bats script

rata · 2025-07-21T08:59:27Z

@ningmingxiao if you can ping here when it's ready for review it would be great. Marking conversations as solved in github doesn't send any notification. And it's hard to know when it's ready for review again if you don't say anything or request another review in github.

ningmingxiao · 2025-07-21T09:11:47Z

ok ！ it can be reviewed now, thanks @rata @AkihiroSuda

rata

Thanks, left a few more comments. But there are several open comments already, please have a look :)

rata · 2025-07-21T14:19:34Z

libcontainer/process_linux.go

-		return nil
-	}
-	if err := unix.SchedSetaffinity(p.pid(), aff.Final); err != nil {
+	if err := unix.SchedSetaffinity(p.pid(), p.config.CPUAffinity.Final); err != nil {


Why is the nil check not needed anymore?

libcontainer/process_linux.go

Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>

ningmingxiao · 2025-08-15T07:47:19Z

ping @rata

cyphar · 2025-08-19T07:04:51Z

After looking at this, I realised that there is a much simpler way of doing this -- I've carried a version in #4858.

@ningmingxiao Can you double-check that the patch I've posted resolves your issue as well?

wwcd · 2025-08-27T06:07:18Z

After looking at this, I realised that there is a much simpler way of doing this -- I've carried a version in #4858.

@ningmingxiao Can you double-check that the patch I've posted resolves your issue as well?

It works well.

AkihiroSuda reviewed Jul 16, 2025

View reviewed changes

libcontainer/process_linux.go Outdated Show resolved Hide resolved

rata reviewed Jul 16, 2025

View reviewed changes

ningmingxiao force-pushed the fix_annify branch from 23fe833 to 880afcb Compare July 17, 2025 02:07

ningmingxiao force-pushed the fix_annify branch from 880afcb to 308ed19 Compare July 17, 2025 02:30

ningmingxiao changed the title ~~fix:cpu cpu affinity~~ fix: cpu affinity Jul 17, 2025

ningmingxiao force-pushed the fix_annify branch from 308ed19 to 3cc2c4b Compare July 17, 2025 02:59

AkihiroSuda reviewed Jul 17, 2025

View reviewed changes

libcontainer/process_linux.go Outdated Show resolved Hide resolved

ningmingxiao force-pushed the fix_annify branch 2 times, most recently from cab02c9 to d72f7b9 Compare July 17, 2025 04:41

AkihiroSuda reviewed Jul 17, 2025

View reviewed changes

libcontainer/utils/utils.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Jul 17, 2025

View reviewed changes

libcontainer/utils/utils.go Show resolved Hide resolved

ningmingxiao force-pushed the fix_annify branch from d72f7b9 to 284a680 Compare July 17, 2025 07:56

rata reviewed Jul 17, 2025

View reviewed changes

ningmingxiao force-pushed the fix_annify branch 2 times, most recently from ec7d0c0 to 2904639 Compare July 17, 2025 13:04

ningmingxiao force-pushed the fix_annify branch 3 times, most recently from be17d50 to 81e87d8 Compare July 18, 2025 02:46

AkihiroSuda reviewed Jul 18, 2025

View reviewed changes

ningmingxiao force-pushed the fix_annify branch from 85bf7eb to d16cdeb Compare July 19, 2025 07:00

ningmingxiao mentioned this pull request Jul 21, 2025

K8s pods (non guaranteed QoS) are CPU confined to systemd's CPUAffinity when containerd runs as a systemd service containerd/containerd#11345

Closed

ningmingxiao force-pushed the fix_annify branch from d16cdeb to 2460950 Compare July 21, 2025 09:35

rata reviewed Jul 21, 2025

View reviewed changes

ningmingxiao force-pushed the fix_annify branch 3 times, most recently from 3542794 to 5b2dff7 Compare August 15, 2025 07:25

fix:cpu affinity

99b95b6

Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>

ningmingxiao force-pushed the fix_annify branch from 5b2dff7 to 99b95b6 Compare August 15, 2025 07:29

cyphar mentioned this pull request Aug 19, 2025

libct: reset CPU affinity by default #4858

Merged

cyphar closed this in #4858 Aug 28, 2025

cyphar mentioned this pull request Nov 19, 2025

Resetting CPU affinity does the opposite on 1024+ CPU systems #5023

Open

fix: cpu affinity #4815

fix: cpu affinity #4815

Uh oh!

Conversation

ningmingxiao commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AkihiroSuda commented Jul 16, 2025

Uh oh!

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

haircommander commented Jul 16, 2025

Uh oh!

ningmingxiao commented Jul 17, 2025

Uh oh!

ningmingxiao commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rata commented Jul 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyphar Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rata commented Jul 21, 2025

Uh oh!

ningmingxiao commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ningmingxiao commented Aug 15, 2025

Uh oh!

cyphar commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwcd commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

ningmingxiao commented Jul 16, 2025 •

edited

Loading

ningmingxiao commented Jul 17, 2025 •

edited

Loading

cyphar Aug 18, 2025 •

edited

Loading

ningmingxiao commented Jul 21, 2025 •

edited

Loading

cyphar commented Aug 19, 2025 •

edited

Loading