-
Notifications
You must be signed in to change notification settings - Fork 194
Description
When I launch KASM container with sysbox with a GPU by sharing --device=/dev/dri/renderD128, sysbox-fs logs go crazy and it goes high CPU usage. I enabled logs and I see this
time="2024-06-01 03:05:54" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 03:05:54" level=debug msg="target: /run/systemd/mount-rootfs/sys/devices/virtual, flags: 0x8, root: /, cwd: /"
time="2024-06-01 03:05:54" level=debug msg="Ignoring unmount of sysbox-fs managed submount at /run/systemd/mount-rootfs/sys/devices/virtual"
time="2024-06-01 03:05:54" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 03:05:54" level=debug msg="target: /run/systemd/mount-rootfs/sys, flags: 0x8, root: /, cwd: /"
time="2024-06-01 03:05:54" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 03:05:54" level=debug msg="target: /run/systemd/mount-rootfs/sys/devices/virtual, flags: 0x8, root: /, cwd: /"
time="2024-06-01 03:05:54" level=debug msg="Ignoring unmount of sysbox-fs managed submount at /run/systemd/mount-rootfs/sys/devices/virtual"
If I restart sysbox-fs service, this issue goes away temporarily on deployed containers (unable to docker exec the running containers afterwards), but if I deploy a new container, this issue again starts while sharing devices or somewhere else (?).
Any reason what causes the infinite loop of /run/systemd/mount-rootfs/sys/devices/virtual unmount call that goes away when sysbox-fs is restarted?
Log File: sysbox-fs.log
(After some researching...)
I can see a lot of "Received umount syscall from pid 1092497" for different targets, and they seem to go perfectly. I just searched for the first occurrence of umount in the log file, tracing every umount call.
time="2024-06-01 02:57:40" level=debug msg="Received umount syscall from pid 1092497"
time="2024-06-01 02:57:40" level=debug msg="target: /sys/fs/cgroup/unified, flags: 0x8, root: /, cwd: /var/labsdata"
time="2024-06-01 02:57:40" level=debug msg="Received mount syscall from pid 1092497"
time="2024-06-01 02:57:40" level=debug msg="source: cgroup2, target: /sys/fs/cgroup/unified, fstype: cgroup2, flags: 0xe, data: , root: /, cwd: /var/labsdata"
time="2024-06-01 02:57:40" level=debug msg="Received umount syscall from pid 1092497"
time="2024-06-01 02:57:40" level=debug msg="target: /sys/fs/cgroup/unified, flags: 0x8, root: /, cwd: /var/labsdata"
From line 7379 of log file we can see the first occurance of umount call to /run/systemd/mount-rootfs/sys/devices/virtual that gets ignored, and from then on its just an infinite loop, for every container I deploy with a device, this just adds up and the log file is full of this messages, I have to turn off the debug log else its consuming lotta storage. This just don't stop, only if I pass the --device=/dev/dri/renderD128, and with the little knowledge I have, I am able to understand this infinite umount calls should be related to this device I passed, somehow causing an infinite loop.
time="2024-06-01 02:58:30" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 02:58:30" level=debug msg="target: /run/systemd/mount-rootfs/sys/devices/virtual, flags: 0x8, root: /, cwd: /"
time="2024-06-01 02:58:30" level=debug msg="Requested ReadDirAll() on directory /sys/kernel/mm/hugepages (req ID=0x1454)"
time="2024-06-01 02:58:30" level=debug msg="Executing ReadDirAll() for req-id: 0x1454, handler: SysKernel, resource: hugepages"
time="2024-06-01 02:58:30" level=debug msg="Ignoring unmount of sysbox-fs managed submount at /run/systemd/mount-rootfs/sys/devices/virtual"
I went through the code located at https://github.com/nestybox/sysbox-fs/blob/master/nsenter/utils.go - this file has a potential possibility to go on a cleanup loop that could repeatedly send unmount calls, that later gets ignored by seccomp, as shown in the log, from here: https://github.com/nestybox/sysbox-fs/blob/4c2bc153f33af1bd30a227a14ecfc8174ff280d5/seccomp/umount.go#L128
Can we skip these devices from unmounting that are for sure going to get ignored by seccomp thus saving lot of CPU? Is my understanding of whats going on is correct? If so, how to solve this issue?