Skip to content

Memory Leak (2x ARC) on Kernel 6.x (Debian12) #15637

Open
@gbonfiglio

Description

@gbonfiglio

System information

Type Version/Name
Distribution Name Debian
Distribution Version Bookworm (12)
Kernel Version 6.1.0-13-amd64
Architecture x86_64
OpenZFS Version zfs-2.1.14-1~bpo12+1

Describe the problem you're observing

Since upgrading to Debian 12 a small system I run as rsync target and rsnapshot host went from a flat 50% RAM usage (out of 2GB total) to constantly OOM'ing even after an upgrade to 8GB. I initially raised the issue as question in #14986 but had too many variables in place to precisely pin the source of the issue (various external processes writing on it, some old zpools which used to have dedup enabled, a ton of snapshots, etc).

I've now managed to reproduce what I think is some sort of memory leak on a simple/vanilla setup, described below, by only using a single data source and rsnapshot. At this point though I'm at a loss and can't go any deeper - any help would be much appreciated.

Due to the (relative) simplicity of the setup I managed to replicate this on, I'm really surprised it's only me reporting.

Describe how to reproduce the problem

1 - I have a sample dataset composed as follows on a remote host 100 GB size, 100k directories, 250k files
2 - The zpool (hosted on a 4 vCPU, 8GB RAM VM) is relatively simple (zpool create -m /mnt/tank tank /dev/sda) and the FS on top of it is zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase -o compression=on tank/secure (note both compression and encryption are on)
3 - rsnapshot is configured to backup the remote (see "1") to /mnt/tank/secure
4 - The source is never changed across iterations, so the rsync step of rsnapshot doesn't actually sync anything
5 - I run 10 iterations back to back, and see ARC filling up to its target 3/4GB size, as one would expect. Additionally, every iteration leaves about 600mb of "abandoned" used RAM

Here is what I'm left with:

               total        used        free      shared  buff/cache   available
Mem:            7951        7140         462           0         625         810
Swap:              0           0           0

 Active / Total Objects (% used)    : 13869528 / 15195981 (91.3%)
 Active / Total Slabs (% used)      : 629764 / 629764 (100.0%)
 Active / Total Caches (% used)     : 144 / 207 (69.6%)
 Active / Total Size (% used)       : 5961935.63K / 6291859.64K (94.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.41K / 16.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
1259468 1259468 100%    1.12K  44981       28   1439392K zfs_znode_cache        
1262688 1262688 100%    0.97K  78918       16   1262688K dnode_t                
1614144 1358737  84%    0.50K 100884       16    807072K kmalloc-512            
 43460  43460 100%   16.00K  21730        2    695360K zio_buf_comb_16384     
1387428 1351895  97%    0.38K  66068       21    528544K dmu_buf_impl_t         
2478084 2024503  81%    0.19K 118004       21    472016K dentry                 
 39580  39580 100%    8.00K   9895        4    316640K kmalloc-8k             
1259472 1259472 100%    0.24K  78717       16    314868K sa_cache   

The same test, on the same system, with an ext4 over luks as destination works as intended and only results in 600mb of RAM used in total.

Include any warning/errors/backtraces from the system logs

Nothing relevant I can spot.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions