-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Ubuntu |
| Distribution Version | 22.04.1 |
| Kernel Version | 5.15.0-48-generic |
| Architecture | x86_64 |
| OpenZFS Version | zfs-2.1.4-0ubuntu0.1 / zfs-kmod-2.1.4-0ubuntu0.1 |
Describe the problem you're observing
I've been using ZFS for the root fs of my desktop, with ARC limited to 2GB, for several months without any issues until now. When I run disk intensive tasks, like borg-backup, duc, find, etc... I can see arc_prune and arc_evict working intermittently, with peaks of about 15% of a thread each, and they manage to keep the arc within limits:
$ rg "dnode|arc_meta" /proc/spl/kstat/zfs/arcstats
dnode_size 4 338218432
arc_meta_used 4 2096995632
arc_meta_limit 4 2147483648
arc_dnode_limit 4 1610612736
arc_meta_max 4 3871211136
arc_meta_min 4 16777216
And once the tasks are finished they both go to sleep. All normal so far.
But yesterday I decided to try another backup tool, Kopia, and I know what it does while doing a backup that makes ARC going out of control. Both arc_prune and arc_evict start using 100% of a CPU thread each. And despite that, they don't manage to keep the memory limits within range, only reaching some kind of balance at around 3.6GB.
$ rg "dnode|arc_meta" /proc/spl/kstat/zfs/arcstats
dnode_size 4 635616064
arc_meta_used 4 3652514272
arc_meta_limit 4 2147483648
arc_dnode_limit 4 1610612736
arc_meta_max 4 3871211136
arc_meta_min 4 16777216
But even after Kopia has finished or I have aborted it, the problem keeps going on indefinitely, even though there's no processes disk activity anymore (iostat and my system's disk led both show some continuous activity though, so it seems it's not just the CPU what they're using).
NOTES:
- Setting the zfs_arc_meta_limit_percent and zfs_arc_dnode_limit_percent to 100 and 75 percent as suggested in 100% CPU load from arc_prune #9966 (related or same issue? Not sure) only delays the problem a few seconds, until the metadata cache reaches the now higher limit and the same thing happens.
echo 3 > /proc/sys/vm/drop_cachesstops it, until I run a backup again.
Describe how to reproduce the problem
Run a Kopia backup of a ZFS filesytem with many files and a low(ish) memory limit for ARC.