-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix: cpu affinity #4815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: cpu affinity #4815
Conversation
|
What was the issue? |
rata
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkihiroSuda +1, it would be great if we can have some test for this.
|
hmm I wonder if we can gather the cpuset from the container spec rather than inheriting what runc is being called with... |
|
I try to get it form cpuset but default is empty. |
|
I add an if,it only happen on linux. |
cab02c9 to
d72f7b9
Compare
rata
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left several comments, but I guess we want also tests in tests/integration :)
| break | ||
| } | ||
| line := string(data) | ||
| if line[:3] != "cpu" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment that all cpu* lines are at the beginning belongs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you mark this as solved? Am I missing something?
| if line[:3] != "cpu" { | ||
| break | ||
| } | ||
| if '0' <= line[3] && line[3] <= '9' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need a commet explaining that there is a "cpu" line that we should ignore and only count cpu lines followed by a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
libcontainer/process_linux.go
Outdated
| if err := setAffinityAll(p.pid()); err != nil { | ||
| return err | ||
| } | ||
| // Set final CPU affinity right after the process is moved into container's cgroup. | ||
| if err := p.setFinalCPUAffinity(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I ignore this, but why do we need to set the affinity to all cpus and then to what is in the config? I don't follow from the link in the description why this step is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this marked as solved? Did I miss something?
ec7d0c0 to
2904639
Compare
|
@ningmingxiao ping when this is ready for another round of reviews (and the tests are green, hopefully :)) |
be17d50 to
81e87d8
Compare
| cpuset := unix.CPUSet{} | ||
| for i := 0; i < int(cpus); i++ { | ||
| cpuset.Set(i) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this tested with nested containers?
cpus here can be different from the number of the available cpus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think the code in this PR will only work if you use lxcfs with nested containers (which nobody does with runc).
I suspect that we would instead need to parse /proc/self/cgroup and then look at the CPU set in /sys/fs/cgroup/cpuset.cpus.effective (but we would also need to check any parent cgroups if cpuset is not in cgroup.controllers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, would just passing nil as described in MarSik@e6ce3af just work?
I believe this is trying to take advantage of the EINVAL error fallback of __sched_setaffinity (which does reset the affnitiy back to the cpuset if there is no overlap between the cpuset and the requested affinity) but I'm not sure it actually works. My reading of __set_cpus_allowed_ptr gives me the impression that this shouldn't work, but the linked commit claims this resolves this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually, sched_setaffinity silently clamps the cpuset you give based on the cpuset for the task. So this is fine.
However, I would like to know if nil works just as well -- less code is better.
In fact, it might be even simpler to just generate a set of 8192 CPUs and get the kernel to clamp it for us? The kernel automatically clamps the size of cpumask to nr_cpu_ids internally so even if you give a really large number they will happily ignore it.
EDIT: Testing this, it seems golang.org/x/sys/unix will silently truncate the cpuset to 1024 CPUs. They have a hardcoded limit of _CPU_SETSIZE.
|
|
||
| - name: integration test (systemd driver) | ||
| run: | | ||
| sudo taskset -pc 0-1 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be moved to the bats script
|
@ningmingxiao if you can ping here when it's ready for review it would be great. Marking conversations as solved in github doesn't send any notification. And it's hard to know when it's ready for review again if you don't say anything or request another review in github. |
|
ok ! it can be reviewed now, thanks @rata @AkihiroSuda |
rata
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left a few more comments. But there are several open comments already, please have a look :)
libcontainer/process_linux.go
Outdated
| return nil | ||
| } | ||
| if err := unix.SchedSetaffinity(p.pid(), aff.Final); err != nil { | ||
| if err := unix.SchedSetaffinity(p.pid(), p.config.CPUAffinity.Final); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the nil check not needed anymore?
3542794 to
5b2dff7
Compare
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
5b2dff7 to
99b95b6
Compare
|
ping @rata |
|
After looking at this, I realised that there is a much simpler way of doing this -- I've carried a version in #4858. @ningmingxiao Can you double-check that the patch I've posted resolves your issue as well? |
It works well. |
Old kernels do that automatically, but new kernels remember the affinity that was set before the cgroup move due to
https://lore.kernel.org/lkml/20220922180041.1768141-1-longman@redhat.com
This is undesirable for containers, because they inherit the systemd affinity when the should really move to the container space cpus.
ref:#4041
this patch https://lore.kernel.org/lkml/20231003205735.2921964-1-longman@redhat.com/ doesn't merge in kernel but merged in redhat I want to find another way to fix it.