Skip to content

arc_prune and arc_evict at 100% even with no disk activity #14005

@luismcv

Description

@luismcv

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04.1
Kernel Version 5.15.0-48-generic
Architecture x86_64
OpenZFS Version zfs-2.1.4-0ubuntu0.1 / zfs-kmod-2.1.4-0ubuntu0.1

Describe the problem you're observing

I've been using ZFS for the root fs of my desktop, with ARC limited to 2GB, for several months without any issues until now. When I run disk intensive tasks, like borg-backup, duc, find, etc... I can see arc_prune and arc_evict working intermittently, with peaks of about 15% of a thread each, and they manage to keep the arc within limits:

$ rg "dnode|arc_meta" /proc/spl/kstat/zfs/arcstats
dnode_size                      4    338218432
arc_meta_used                   4    2096995632
arc_meta_limit                  4    2147483648
arc_dnode_limit                 4    1610612736
arc_meta_max                    4    3871211136
arc_meta_min                    4    16777216

And once the tasks are finished they both go to sleep. All normal so far.

But yesterday I decided to try another backup tool, Kopia, and I know what it does while doing a backup that makes ARC going out of control. Both arc_prune and arc_evict start using 100% of a CPU thread each. And despite that, they don't manage to keep the memory limits within range, only reaching some kind of balance at around 3.6GB.

$ rg "dnode|arc_meta" /proc/spl/kstat/zfs/arcstats
dnode_size                      4    635616064
arc_meta_used                   4    3652514272
arc_meta_limit                  4    2147483648
arc_dnode_limit                 4    1610612736
arc_meta_max                    4    3871211136
arc_meta_min                    4    16777216

But even after Kopia has finished or I have aborted it, the problem keeps going on indefinitely, even though there's no processes disk activity anymore (iostat and my system's disk led both show some continuous activity though, so it seems it's not just the CPU what they're using).

NOTES:

  • Setting the zfs_arc_meta_limit_percent and zfs_arc_dnode_limit_percent to 100 and 75 percent as suggested in 100% CPU load from arc_prune #9966 (related or same issue? Not sure) only delays the problem a few seconds, until the metadata cache reaches the now higher limit and the same thing happens.
  • echo 3 > /proc/sys/vm/drop_caches stops it, until I run a backup again.

Describe how to reproduce the problem

Run a Kopia backup of a ZFS filesytem with many files and a low(ish) memory limit for ARC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions