Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 15, 2025

Apparently, with GHA, having a minor of 0 does not always mean it's the
whole device (and not a partition):

 === /proc/partitions (using major: 259) ===
 major minor  #blocks  name

    8       16   78643200 sdb
    8       17   77593583 sdb1
    8       30       4096 sdb14
    8       31     108544 sdb15
  259        0     934912 sdb16
    8        0   78643200 sda
    8        1   78641152 sda1

Let's use lsblk -d to find out whole devices.

This fixes a test case which was added in commit 7696402 (PR #4775).

@kolyshkin kolyshkin changed the title [debug] tests/int/update: fix getting block major Jul 15, 2025
@kolyshkin kolyshkin marked this pull request as ready for review July 15, 2025 02:07
@lifubang
Copy link
Member

lifubang commented Jul 15, 2025

In ubuntu, maybe we can use this script:

lsblk -d | awk '$6 == "disk" {print $2}' | awk -F':' '$2 == 0 {print $1}'

I see this test in all ubuntu are skipped: # skip can't get device major number from /proc/partitions (got )

In my machine:

lifubang@acmcoder ~ $ lsblk -d
NAME
    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0  11:0    1 1024M  0 rom  
vda 253:0    0  256G  0 disk

@tianon
Copy link
Member

tianon commented Jul 15, 2025

My machine has two physical disks and I can confirm that minor is not always zero:

$ lsblk -d
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sr0      11:0    1  1024M  0 rom  
nvme0n1 259:0    0 953.9G  0 disk 
nvme1n1 259:4    0 953.9G  0 disk 

However, it also seems that -o MAJ is not universally supported:

$ lsblk -dno MAJ
lsblk: unknown column: MAJ
$ lsblk -dno MAJ:MIN
 11:0  
259:0  
259:4  

Maybe it's a bit much since cut is mostly sufficient, but jq can do this in a way that's pretty clean to read later:

$ lsblk --nodeps --json | jq --raw-output 'first(.blockdevices[] | select(.type == "disk"))["maj:min"] | split(":")[0]'
259
$ lsblk -dno MAJ:MIN,TYPE -E TYPE | grep ' disk$' | cut -d: -f1
259
$ lsblk -dno MAJ:MIN,TYPE -E TYPE | awk -F '[ :]+' '$3 == "disk" { print $1 }'
259

edit: if you want to hyper-minimize flags, you can get away with something like this:

$ lsblk -do TYPE,MAJ:MIN | awk -F '[ :]+' '$1 == "disk" { print $2; exit }'
259

(it feels kind of ironic that lsblk supports filtering natively, but by device major 😂)

@tianon
Copy link
Member

tianon commented Jul 15, 2025

Another alternative could be /proc/devices, but I don't know how reliable the blkext string there is:

$ grep 259 /proc/devices
259 blkext
$ awk '$2 == "blkext" { print $1; exit }' /proc/devices 
259

@tianon
Copy link
Member

tianon commented Jul 15, 2025

Aha, here's a machine with a much wilder array of devices -- happy to do more testing of this one-liner on it, if that's helpful:

$ lsblk -d
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda       8:0    0  16.4T  0 disk 
sdb       8:16   0 953.9G  0 disk 
sdc       8:32   0  12.7T  0 disk 
sdd       8:48   0 476.9G  0 disk 
sde       8:64   0   7.3T  0 disk 
sdf       8:80   0  14.6T  0 disk 
sdg       8:96   0  16.4T  0 disk 
sdh       8:112  0  14.6T  0 disk 
nvme0n1 259:0    0   1.8T  0 disk 
nvme1n1 259:1    0   1.8T  0 disk 

@tianon
Copy link
Member

tianon commented Jul 15, 2025

... oh, looking further down, the rest of this test does assume there exists a minor of zero with the given major, which I guess is probably fair? I don't know how those get assigned, but they appear to be assigned pretty reliably in a decent pattern that always starts at 0 on the very small sample size I've got handy.

@kolyshkin
Copy link
Contributor Author

The assumption that minor is always 0 was incorrect in the test case, I am fixing that.

Wonder if filtering on disk is really needed? @tianon can you test that you can set io limits on your 11:0 (sr0) which is "rom" rather than "disk"?

Assuming it's cgroup v2, this would be something like:

IOMAX=/sys/fs/cgroup/$(cat /proc/self/cgroup | sed 's/^0:://')/io.max
tail -v $IOMAX
DEV=11:0
echo $DEV wiops=1000 > $IOMAX
echo $DEV riops=1000 > $IOMAX
echo $DEV wbps=100000 > $IOMAX
echo $DEV rbps=100000 > $IOMAX
tail -v $IOMAX
# To undo:
echo $DEV wiops=max riops=max rbps=max wbps=max > $IOMAX
tail -v $IOMAX

@kolyshkin kolyshkin force-pushed the debug-ci-failure branch 4 times, most recently from cb3d27f to e346830 Compare July 15, 2025 19:18
@tianon
Copy link
Member

tianon commented Jul 16, 2025

Yep! 🤷 seems like it's fine:

$ sudo systemd-run --scope bash
Running scope as unit: run-r91caaa1102d94a48be548a12cea6a06c.scope
# IOMAX=/sys/fs/cgroup/$(cat /proc/self/cgroup | sed 's/^0:://')/io.max
# tail -v $IOMAX
==> /sys/fs/cgroup//system.slice/run-r91caaa1102d94a48be548a12cea6a06c.scope/io.max <==
# DEV=11:0
# echo $DEV wiops=1000 > $IOMAX
# echo $DEV riops=1000 > $IOMAX
# echo $DEV wbps=100000 > $IOMAX
# echo $DEV rbps=100000 > $IOMAX
# tail -v $IOMAX
==> /sys/fs/cgroup//system.slice/run-r91caaa1102d94a48be548a12cea6a06c.scope/io.max <==
11:0 rbps=100000 wbps=100000 riops=1000 wiops=1000

I tried changing DEV to something obviously bogus like 12345:0 and it failed appropriately with bash: echo: write error: No such device. Even 11:1 instead of 11:0 fails. 👍

@kolyshkin
Copy link
Contributor Author

@lifubang @thaJeztah @AkihiroSuda PTAL

echo "=== lsblk -d ===" >&2
lsblk -d >&2
echo "===" >&2
skip "can't get device from lsblk"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not skip this test, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, assuming that currently it works.

If it currently works, it's probably better to make it a hard failure, so that we can decide whether to skip or fail when we run into failures (which may be a more specific "skip" for expected conditions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah
Copy link
Member

FWIW, I recently learned that lsblk for certain parts may depend on systemd to be running. Don't think it's relevant here, but in case we'd want to be running these tests inside a container;

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending @AkihiroSuda's comment above.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

As others suggested, let's make the test fail if $dev is empty. So we don't stop testing this inadvertently. But I don't need to review again that small change :)

echo "=== lsblk -d ===" >&2
lsblk -d >&2
echo "===" >&2
skip "can't get device from lsblk"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, having a minor of 0 does not always mean it's the
whole device (not a partition):

	 === /proc/partitions (using major: 259) ===
	 major minor  #blocks  name

	    8       16   78643200 sdb
	    8       17   77593583 sdb1
	    8       30       4096 sdb14
	    8       31     108544 sdb15
	  259        0     934912 sdb16
	    8        0   78643200 sda
	    8        1   78641152 sda1

Rewrite the test to not assume minor is 0, and use
lsblk -d to find out whole devices.

This fixes a test case which was added in commit 7696402.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

To me, skip seems more appropriate, with the meaning of "conditions to run the test were not met, so the test was not run" rather than "there's a bug in the code the test found".

OTOH I agree that fail will help us debug the issue of getting a device. Also, with skip, the lsblk -d is not really shown.

Fixed.

@kolyshkin kolyshkin enabled auto-merge July 16, 2025 17:31
@kolyshkin kolyshkin merged commit cac7cf6 into opencontainers:main Jul 16, 2025
30 of 31 checks passed
@kolyshkin kolyshkin added the backport/1.3-done A PR in main branch which has been backported to release-1.3 label Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci backport/1.3-done A PR in main branch which has been backported to release-1.3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants