-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad latency reading large files from mounted volumes #6553
Comments
I can reproduce this issue, and it doesn't seem to be anything Talos at the moment, it's a regular mount on the host. Also interesting that it only happens for the first time, if you run the command once again, it succeeds immediately. It might be something ext4-related. Needs more digging. |
Re-doing your reproducer with
So I think it's something ext4 specific coupled with slow I/O performance of the QEMU volume ( |
okay, I know that this is, and thanks for reporting this bug! |
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit d3cf061)
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit d3cf061)
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit d3cf061)
Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Bug Report
Description
When I attempt to read large files (multiple GB) in volumes mounted into pods on a Talos cluster, the time to read just one byte out of the file is insanely long, and seems to scale with the size of the file. I've watched transfer metrics, and it looks like the entire file is getting read in at open time (!?!). None of the readahead settings I know of look out of whack (read_ahead_kb setting for the device is 128KB, filesystem readaheads look okay, but probably aren't relevant in light of the info in the next paragraph).
So far, it does not matter what filesystem is mounted, or what CSI the mount is using; I have replicated using RBD, CephFS, geesefs S3, OpenEBS lvm-localpv, and plain old built-in local volumes. hostPath mounts and local volume mounts on the system disk are the only mounts where I haven't been able to replicate this behavior.
I have not been able to replicate this issue when I installed the same version of k3s or k8s via kubeadm in the same environment, which is why I'm thinking this is a Talos issue.
Logs
Haven't found anything in logs that seems relevant, although when I add an
strace
to the seconddd
in the reproducer below, the long hang occurs during theopenat()
call opening the file from the mounted volume. Let me know if there are any logs you would like me to provide, but hopefully the reproducer is simple enough that you can replicate in your environment.Environment
Reproducer
sudo --preserve-env=HOME talosctl cluster create --provisioner qemu --extra-disks 1 --extra-disks-size 20480
.pvc.yaml.txt
alpine.yaml.txt
kubectl exec -it pod/alpine -- /bin/sh
dd if=/dev/zero of=/data/zerofile bs=4M count=4096
to create a 16GB file.time dd if=/data/zerofile of=/dev/null bs=1K count=1
to read the first KB of that file.dd
is very long.Sample reproducer output (on my dev laptop with QEMU VMs running on NVME)
The text was updated successfully, but these errors were encountered: