-
Notifications
You must be signed in to change notification settings - Fork 2.2k
libct: reset CPU affinity by default #4858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1e8b14f to
61891c9
Compare
|
FYI, passing |
b0110a8 to
a6dcf85
Compare
|
Strangely, the rootless |
a6dcf85 to
b28358a
Compare
b28358a to
7cd11e6
Compare
tests/integration/helpers.bash
Outdated
| # Wrapper for runc. | ||
| function runc() { | ||
| run __runc "$@" | ||
| # Wrapper around "run2 that logs output to make tests easier to debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, bats has --verbose-run now which kind of removes the need for such helper. I haven't tried that though, might not be that useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take a look in a separate PR -- I added this wrapper back in 2016, given all the work since then I'm sure they've finally resolved this (fairly common) pain point by now.
|
There's a kernel patch that would help tremendously: https://lore.kernel.org/lkml/20220922180041.1768141-1-longman@redhat.com. With it, empty cpuset is treated as "set to all available cpus". Alas, it's not merged (but I think it is added to RHEL kernels). It avoids somewhat slow calculations on every exec so the fix is way simpler (see #4041). |
|
@kolyshkin AFAICS, that patchset was merged in Linux 6.2 (torvalds/linux@bf57ae2) -- I linked the merged version of patch 3 in the description (torvalds/linux@da019032819a). As for using the empty set instead, it looks to me that the idea of #4041 was to use AFAICS, using an empty CPU mask will just give you static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
struct affinity_context *ctx,
struct rq *rq,
struct rq_flags *rf)
{
const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
const struct cpumask *cpu_valid_mask = cpu_active_mask;
bool kthread = p->flags & PF_KTHREAD;
unsigned int dest_cpu;
int ret = 0;
/* ... */
if (!kthread && !cpumask_subset(ctx->new_mask, cpu_allowed_mask)) {
ret = -EINVAL;
goto out;
}
/* ... */
}(I also tested this locally and it doesn't work either, even if you ignore Since That being said, I really don't think this is going to be a performance problem. The overhead we're talking about is 1us (we could cache it with |
|
Another alternative which doesn't require per-architecture code and is 10x faster is: cpuset := unix.CPUSet{}
for i := range bits.UintSize {
cpuset.Set(i)
}
for i := 1; i < len(cpuset); i *= 2 {
copy(cpuset[i:], cpuset[:i])
}Or even better (if you don't mind hacky solutions): cpuset := unix.CPUSet{}
for i := range cpuset {
cpuset[i]-- // underflow to 0xFF..FF
}(Naive is just copying |
7cd11e6 to
e1d8c91
Compare
|
I pushed the hacky underflow solution. |
eda9b22 to
c8e4019
Compare
857e938 to
ae6d229
Compare
"runc" was a special wrapper around bats's "run" which output some very useful diagnostic information to the bats log, but this was not usable for other commands. So let's make it a more generic helper that we can use for other commands. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Sometimes we need to run runc through some wrapper (like nohup), but because "__runc" and "runc" are bash functions in our test suite this doesn't work trivially -- and you cannot just pass "$RUNC" because you you need to set --root for rootless tests. So create a setup_runc_cmdline helper which sets $RUNC_CMDLINE to the beginning cmdline used by __runc (and switch __runc to use that). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
In certain deployments, it's possible for runc to be spawned by a
process with a restrictive cpumask (such as from a systemd unit with
CPUAffinity=... configured) which will be inherited by runc and thus the
container process by default.
The cpuset cgroup used to reconfigure the cpumask automatically for
joining processes, but kcommit da019032819a ("sched: Enforce user
requested affinity") changed this behaviour in Linux 6.2.
The solution is to try to emulate the expected behaviour by resetting
our cpumask to correspond with the configured cpuset (in the case of
"runc exec", if the user did not configure an alternative one). Normally
we would have to parse /proc/stat and /sys/fs/cgroup, but luckily
sched_setaffinity(2) will transparently convert an all-set cpumask (even
if it has more entries than the number of CPUs on the system) to the
correct value for our usecase.
For some reason, in our CI it seems that rootless --systemd-cgroup
results in the cpuset (presumably temporarily?) being configured such
that sched_setaffinity(2) will allow the full set of CPUs. For this
particular case, all we care about is that it is different to the
original set, so include some special-casing (but we should probably
investigate this further...).
Reported-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Reported-by: Martin Sivak <msivak@redhat.com>
Reported-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
ae6d229 to
121192a
Compare
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@cyphar any idea how far out the next 1.3 release is? |
|
We can probably cut a release now, but I also want to get 1.4.0-rc1 ready soon as well. There's also at least one other PR for 1.3.1. |
|
is there also a plan to backport this for a 1.2.7 release? |
|
Awesome! Can we expect to see releases from these branches within the next day or so? |
|
I will be sending release PRs this week. |
This is a much simpler alternative to #4815 and includes more self-explanatory tests.
EDIT: #4041 is a similar approach but AFAICS it actually wouldn't work.
In certain deployments, it's possible for runc to be spawned by a
process with a restrictive cpumask (such as from a systemd unit with
CPUAffinity=...configured) which will be inherited by runc and thus thecontainer process by default.
The cpuset cgroup used to reconfigure the cpumask automatically for
joining processes, but torvalds/linux@da019032819a ("sched: Enforce user
requested affinity") changed this behaviour in Linux 6.2.
The solution is to try to emulate the expected behaviour by resetting
our cpumask to correspond with the configured cpuset (in the case of
runc exec, if the user did not configure an alternative one). Normallywe would have to parse
/proc/statand/sys/fs/cgroup, but luckilysched_setaffinity(2)will transparently convert an all-set cpumask (evenif it has more entries than the number of CPUs on the system) to the
correct value for our usecase.
Closes #4815
Closes #4041
Reported-by: ningmingxiao ning.mingxiao@zte.com.cn
Signed-off-by: Aleksa Sarai cyphar@cyphar.com