tests: bfq: skip tests on misbehaving udev systems #4825

cyphar · 2025-07-31T03:28:30Z

openSUSE has an unfortunate default udev setup which forcefully sets all
loop devices to use the "none" scheduler, even if you manually set it.
As this is a property of the host configuration (and udev is monitoring
from the host) we cannot really change this behaviour from inside our
test container.

So we should just skip the test in this (hopefully unusual) case.
Ideally tools running the test suite should disable this behaviour when
running our test suite.

Fixes #4781
Signed-off-by: Aleksa Sarai cyphar@cyphar.com

tests/integration/cgroups.bats

ricardobranco777 · 2025-07-31T08:05:17Z

This patch seems to work on x86_64 for Tumbleweed but on aarch64 I'm still seeing this:

https://openqa.opensuse.org/tests/5209568/file/runc-runc-root.tap

# runc run -d --console-socket /tmp/bats-run-jbQ2hk/runc.7sPUsq/tty/sock test_dev_weight (status=1):
# time="2025-07-31T03:51:17-04:00" level=error msg="runc run failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: setting device weight \"7:0 444\": write /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test-19843.scope/io.bfq.weight: operation not supported"

On SLES 16.0 I still see it on both arches.

https://openqa.suse.de/tests/18613516/file/runc-runc-root.tap

# runc run -d --console-socket /tmp/bats-run-Hso6jF/runc.E25m9g/tty/sock test_dev_weight (status=1):
# time="2025-07-31T09:53:27+02:00" level=error msg="runc run failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: setting device weight \"7:0 444\": write /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test-24605.scope/io.bfq.weight: operation not supported"
# --- teardown ---
# losetup -d '/dev/loop0'

cyphar · 2025-07-31T08:14:57Z

@ricardobranco777 Did you apply both patches? Unfortunately, there could be a race with udev where it sets the scheduler back after we checked it. Not sure if there is a better solution than modifying the host config, to be honest...

On my Tumbleweed machine, I haven't managed to hit that race yet though...

ricardobranco777 · 2025-07-31T08:17:13Z

@ricardobranco777 Did you apply both patches?

Yes.

Unfortunately, there could be a race with udev where it sets the scheduler back after we checked it. Not sure if there is a better solution than modifying the host config, to be honest...

Ok. I'll look into that instead. Thanks!

cyphar · 2025-07-31T08:34:19Z

Does it fail consistently even with the patches applied? The patch should just cause the problematic test to get skipped if udev is silently changing the scheduler...

If you have actual access to the OpenQA box, there is a bpftrace script from the issue that will tell us who is changing the scheduler and when.

I can check the qcows myself later if I have some time.

ricardobranco777 · 2025-07-31T08:42:18Z

Does it fail consistently even with the patches applied?

No.

If you have actual access to the OpenQA box, there is a bpftrace script from the issue that will tell us who is changing the scheduler and when.

I can check the qcows myself later if I have some time.

I'm applying the patch to SLES 15-SP4+ & Tumbleweed here:

os-autoinst/os-autoinst-distri-opensuse#22825

tests/integration/helpers.bash

tests/integration/cgroups.bats

cyphar · 2025-08-02T10:01:13Z

@ricardobranco777 Can you try it again with the sleep 2s version of the patch?

If an error occurs during a test which sets up loopback devices, the loopback device is not freed. Since most systems have very conservative limits on the number of loopback devices, re-running a failing test locally to debug it often ends up erroring out due to loopback device exhaustion. So let's just move the "losetup -d" to teardown, where it belongs. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

openSUSE has an unfortunate default udev setup which forcefully sets all loop devices to use the "none" scheduler, even if you manually set it. As this is a property of the host configuration (and udev is monitoring from the host) we cannot really change this behaviour from inside our test container. So we should just skip the test in this (hopefully unusual) case. Ideally tools running the test suite should disable this behaviour when running our test suite. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

ricardobranco777 · 2025-08-02T10:29:06Z

@ricardobranco777 Can you try it again with the sleep 2s version of the patch?

Sure. I just cloned openQA jobs and the links are available in this PR description.

os-autoinst/os-autoinst-distri-opensuse#22825

ricardobranco777 · 2025-08-02T10:48:31Z

@ricardobranco777 Can you try it again with the sleep 2s version of the patch?

It works. Now I can't unignore cgroups.bats in our tests. Thanks!

cyphar · 2025-08-04T10:20:49Z

/ping @kolyshkin

@ricardobranco777 says the sleep 2s approach you suggested fixes the issue, so this should be good to merge.

ricardobranco777 · 2025-08-04T20:43:03Z

Successfully tested on:

s390x for SLES only (runc 1.2.6)
ppc64le for SLES & Tumbleweed (runc 1.3.0)

kolyshkin

LGTM (nit: second commit description doesn't mention sleep)

kolyshkin · 2025-08-04T23:41:49Z

@rata @lifubang PTAL

rata

Thanks for tackling this @kolyshkin @cyphar !

This LGTM, but left a comment that I think would be slightly better. If you don't agree, feel free to ignore it and merge :)

rata · 2025-08-05T13:15:49Z

tests/integration/cgroups.bats

+	# usually triggered by the "change" event from losetup, we can wait for a
+	# little bit before continuing the test. For more details, see
+	# <https://github.com/opencontainers/runc/issues/4781>.
+	sleep 2s


I'm okay with this. I wonder if doing the sleep only if the distro is suse is better, though.

That way we don't affect any other platform (IIRC github actions is not running suse at all) and we do notice if any other platform has this behavior, and we can decide to skip it in that platform too.

rata · 2025-08-05T13:18:45Z

Oh, auto-merge was enabled :-D

rata · 2025-08-05T15:00:09Z

Created #4838

cyphar mentioned this pull request Jul 31, 2025

cgroups test fails with "io.bfq.weight: operation not supported" #4781

Closed

cyphar commented Jul 31, 2025

View reviewed changes

tests/integration/cgroups.bats Outdated Show resolved Hide resolved

cyphar force-pushed the test-bfq-policy branch 2 times, most recently from 46218c8 to 357318f Compare July 31, 2025 06:13

ricardobranco777 mentioned this pull request Jul 31, 2025

bats/runc: Apply PR#4825 to unignore cgroups test os-autoinst/os-autoinst-distri-opensuse#22825

Merged

kolyshkin reviewed Jul 31, 2025

View reviewed changes

tests/integration/helpers.bash Outdated Show resolved Hide resolved

kolyshkin reviewed Jul 31, 2025

View reviewed changes

tests/integration/cgroups.bats Outdated Show resolved Hide resolved

cyphar force-pushed the test-bfq-policy branch from 357318f to d9224e0 Compare August 2, 2025 10:01

cyphar added 2 commits August 2, 2025 20:01

cyphar force-pushed the test-bfq-policy branch from d9224e0 to e6b4b5a Compare August 2, 2025 10:01

ricardobranco777 approved these changes Aug 2, 2025

View reviewed changes

kolyshkin approved these changes Aug 4, 2025

View reviewed changes

kolyshkin enabled auto-merge August 4, 2025 23:34

rata approved these changes Aug 5, 2025

View reviewed changes

kolyshkin merged commit 67112aa into opencontainers:main Aug 5, 2025
31 checks passed

rata mentioned this pull request Aug 5, 2025

tests: Only sleep on suse #4838

Closed

cyphar deleted the test-bfq-policy branch August 5, 2025 16:29

kolyshkin added the backport/1.3-done A PR in main branch which has been backported to release-1.3 label Oct 14, 2025

kolyshkin mentioned this pull request Oct 14, 2025

[1.3] runc update: support per-device weight and iops #4931

Merged

tests: bfq: skip tests on misbehaving udev systems #4825

tests: bfq: skip tests on misbehaving udev systems #4825

Uh oh!

Conversation

cyphar commented Jul 31, 2025

Uh oh!

Uh oh!

ricardobranco777 commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyphar commented Jul 31, 2025

Uh oh!

ricardobranco777 commented Jul 31, 2025

Uh oh!

cyphar commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardobranco777 commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

cyphar commented Aug 2, 2025

Uh oh!

ricardobranco777 commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardobranco777 commented Aug 2, 2025

Uh oh!

cyphar commented Aug 4, 2025

Uh oh!

ricardobranco777 commented Aug 4, 2025

Uh oh!

kolyshkin left a comment

Choose a reason for hiding this comment

Uh oh!

kolyshkin commented Aug 4, 2025

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

rata Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rata commented Aug 5, 2025

Uh oh!

rata commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ricardobranco777 commented Jul 31, 2025 •

edited

Loading

cyphar commented Jul 31, 2025 •

edited

Loading

ricardobranco777 commented Aug 2, 2025 •

edited

Loading