Bad latency reading large files from mounted volumes #6553

evandeaubl · 2022-11-26T15:24:08Z

Bug Report

Description

When I attempt to read large files (multiple GB) in volumes mounted into pods on a Talos cluster, the time to read just one byte out of the file is insanely long, and seems to scale with the size of the file. I've watched transfer metrics, and it looks like the entire file is getting read in at open time (!?!). None of the readahead settings I know of look out of whack (read_ahead_kb setting for the device is 128KB, filesystem readaheads look okay, but probably aren't relevant in light of the info in the next paragraph).

So far, it does not matter what filesystem is mounted, or what CSI the mount is using; I have replicated using RBD, CephFS, geesefs S3, OpenEBS lvm-localpv, and plain old built-in local volumes. hostPath mounts and local volume mounts on the system disk are the only mounts where I haven't been able to replicate this behavior.

I have not been able to replicate this issue when I installed the same version of k3s or k8s via kubeadm in the same environment, which is why I'm thinking this is a Talos issue.

Logs

Haven't found anything in logs that seems relevant, although when I add an strace to the second dd in the reproducer below, the long hang occurs during the openat() call opening the file from the mounted volume. Let me know if there are any logs you would like me to provide, but hopefully the reproducer is simple enough that you can replicate in your environment.

Environment

Talos version: 1.2.7
Kubernetes version: 1.25.4
Platform: metal

Reproducer

Spin up a QEMU cluster using sudo --preserve-env=HOME talosctl cluster create --provisioner qemu --extra-disks 1 --extra-disks-size 20480.
Deploy the PV/PVC/pod from the attached manifests.

pvc.yaml.txt
alpine.yaml.txt

Run kubectl exec -it pod/alpine -- /bin/sh
Run dd if=/dev/zero of=/data/zerofile bs=4M count=4096 to create a 16GB file.
Run time dd if=/data/zerofile of=/dev/null bs=1K count=1 to read the first KB of that file.
Observe that time to run second dd is very long.

Sample reproducer output (on my dev laptop with QEMU VMs running on NVME)

[evan@nitrogen test]$ kubectl exec -it pod/alpine -- /bin/sh
/ # dd if=/dev/zero of=/data/zerofile bs=4M count=4096
4096+0 records in
4096+0 records out
/ # time dd if=/data/zerofile of=/dev/null bs=1K count=1
1+0 records in
1+0 records out
real	0m 46.57s
user	0m 0.00s
sys	0m 46.50s
/ #

The text was updated successfully, but these errors were encountered:

smira · 2022-11-30T19:39:53Z

I can reproduce this issue, and it doesn't seem to be anything Talos at the moment, it's a regular mount on the host.

Also interesting that it only happens for the first time, if you run the command once again, it succeeds immediately. It might be something ext4-related. Needs more digging.

smira · 2022-11-30T19:44:22Z

At the time when it "hangs", it's blocked in the Linux kernel:

smira · 2022-11-30T20:03:52Z

Re-doing your reproducer with fsType: xfs in the PV "fixes" the problem:

# time dd if=/data/zerofile of=/dev/null bs=1K count=1
1+0 records in
1+0 records out
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s

So I think it's something ext4 specific coupled with slow I/O performance of the QEMU volume (talosctl cluster create was never optimized for performance, probably with .qcow volumes it would be way better).

smira · 2022-12-01T13:45:24Z

okay, I know that this is, and thanks for reporting this bug!

Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>

Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit d3cf061)

Fixes siderolabs#6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>

smira self-assigned this Nov 28, 2022

smira added this to the v1.3 milestone Nov 28, 2022

smira mentioned this issue Dec 1, 2022

fix: ignore many more filesystems in IMA #6575

Merged

talos-bot closed this as completed in d3cf061 Dec 1, 2022

github-actions bot locked as resolved and limited conversation to collaborators Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad latency reading large files from mounted volumes #6553

Bad latency reading large files from mounted volumes #6553

evandeaubl commented Nov 26, 2022

smira commented Nov 30, 2022

smira commented Nov 30, 2022

smira commented Nov 30, 2022

smira commented Dec 1, 2022

Bad latency reading large files from mounted volumes #6553

Bad latency reading large files from mounted volumes #6553

Comments

evandeaubl commented Nov 26, 2022

Bug Report

Description

Logs

Environment

Reproducer

Sample reproducer output (on my dev laptop with QEMU VMs running on NVME)

smira commented Nov 30, 2022

smira commented Nov 30, 2022

smira commented Nov 30, 2022

smira commented Dec 1, 2022