Description
System information
Type | Version/Name |
---|---|
Distribution Name | Ubuntu Server |
Distribution Version | 19.10 Eoan Ermine |
Linux Kernel | 5.3.0-13-generic #14-Ubuntu SMP Tue Sep 24 02:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
Architecture | amd64 |
ZFS Version | 0.8.1-1ubuntu11 |
Describe the problem you're observing
After some hours of system uptime, a kworker
process will suck up 100% of a core. The system load averages will climb constantly. This kworker process appears to be ZFS related (as of this writing, the current CPU hog is named kworker/u33:3+flush-zfs-19
).
This system was originally on Ubuntu Server 18.04 with ZFS 0.8.1 that I had compiled some time before. It was at several months of uptime with no issues. Out of nowhere, this issue came up.
Things that have been tried:
- Rebooting several times.
- Upgrading to the Ubuntu HWE kernel
- Updating to the development release of Ubuntu 19.10 (which comes with ZFS 0.8) to rule out problems with the version I compiled.
I've tried everything I can think of, and since various ZFS-related terms come up in the kworker name and messages printed to dmesg, I am left to assume it is a problem in ZFS.
Describe how to reproduce the problem
Unknown. The system this is occurring on has been stable with minimum in the way of changes for about a year. The system is a host to various LXC containers. The root filesystem is ZFS, on a single 500GB Crucial SSD. All data for the LXC containers is stored on an encrypted storage pool. Output of zpool status:
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:02:31 with 0 errors on Mon Oct 7 22:35:32 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
sda1 ONLINE 0 0 0
errors: No known data errors
pool: storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:55:30 with 0 errors on Mon Oct 7 23:35:51 2019
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000cca24300d8f4 ONLINE 0 0 0
scsi-35000cca243014464 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000cca243015924 ONLINE 0 0 0
scsi-35000cca24400a9a4 ONLINE 0 0 0
errors: No known data errors
Include any warning/errors/backtraces from the system logs
Not sure what would be helpful for debugging, but here's some stuff that was printed to dmesg:
[47488.869573] INFO: task pip:19670 blocked for more than 120 seconds.
[47488.869627] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47488.869670] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47488.869718] pip D 0 19670 19606 0x00000120
[47488.869722] Call Trace:
[47488.869732] __schedule+0x2b9/0x6c0
[47488.869736] ? bit_wait_io+0x50/0x50
[47488.869738] schedule+0x42/0xb0
[47488.869741] bit_wait+0x11/0x50
[47488.869743] __wait_on_bit+0x33/0xa0
[47488.869749] __inode_wait_for_writeback+0xae/0xe0
[47488.869753] ? var_wake_function+0x30/0x30
[47488.869756] inode_wait_for_writeback+0x27/0x40
[47488.869759] evict+0xbd/0x1b0
[47488.869761] iput+0x141/0x1f0
[47488.869764] do_unlinkat+0x1c5/0x2d0
[47488.869767] __x64_sys_unlinkat+0x38/0x60
[47488.869770] do_syscall_64+0x5a/0x130
[47488.869774] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47488.869777] RIP: 0033:0x7f7a4ac39d77
[47488.869784] Code: Bad RIP value.
[47488.869786] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47488.869788] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47488.869790] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47488.869791] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47488.869792] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47488.869793] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47488.869797] INFO: task sync:20009 blocked for more than 120 seconds.
[47488.869840] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47488.869883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47488.869933] sync D 0 20009 20008 0x00004120
[47488.869936] Call Trace:
[47488.869940] __schedule+0x2b9/0x6c0
[47488.869943] schedule+0x42/0xb0
[47488.869945] wb_wait_for_completion+0x64/0x90
[47488.869948] ? wait_woken+0x80/0x80
[47488.869950] sync_inodes_sb+0xd7/0x290
[47488.869954] sync_inodes_one_sb+0x15/0x20
[47488.869957] iterate_supers+0xa3/0x110
[47488.869959] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47488.869962] ksys_sync+0x42/0xb0
[47488.869964] __ia32_sys_sync+0xe/0x20
[47488.869967] do_syscall_64+0x5a/0x130
[47488.869970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47488.869972] RIP: 0033:0x7f56f6bd7287
[47488.869976] Code: Bad RIP value.
[47488.869977] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47488.869979] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47488.869981] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47488.869982] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47488.869984] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47488.869985] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47609.702068] INFO: task pip:19670 blocked for more than 241 seconds.
[47609.702114] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47609.702150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.702191] pip D 0 19670 19606 0x00000120
[47609.702194] Call Trace:
[47609.702203] __schedule+0x2b9/0x6c0
[47609.702206] ? bit_wait_io+0x50/0x50
[47609.702209] schedule+0x42/0xb0
[47609.702210] bit_wait+0x11/0x50
[47609.702212] __wait_on_bit+0x33/0xa0
[47609.702218] __inode_wait_for_writeback+0xae/0xe0
[47609.702221] ? var_wake_function+0x30/0x30
[47609.702223] inode_wait_for_writeback+0x27/0x40
[47609.702226] evict+0xbd/0x1b0
[47609.702227] iput+0x141/0x1f0
[47609.702230] do_unlinkat+0x1c5/0x2d0
[47609.702232] __x64_sys_unlinkat+0x38/0x60
[47609.702235] do_syscall_64+0x5a/0x130
[47609.702238] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47609.702240] RIP: 0033:0x7f7a4ac39d77
[47609.702247] Code: Bad RIP value.
[47609.702248] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47609.702251] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47609.702252] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47609.702253] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47609.702254] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47609.702255] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47609.702258] INFO: task sync:20009 blocked for more than 241 seconds.
[47609.702293] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47609.702328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.702368] sync D 0 20009 20008 0x00004120
[47609.702370] Call Trace:
[47609.702373] __schedule+0x2b9/0x6c0
[47609.702376] schedule+0x42/0xb0
[47609.702378] wb_wait_for_completion+0x64/0x90
[47609.702379] ? wait_woken+0x80/0x80
[47609.702381] sync_inodes_sb+0xd7/0x290
[47609.702384] sync_inodes_one_sb+0x15/0x20
[47609.702386] iterate_supers+0xa3/0x110
[47609.702388] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47609.702391] ksys_sync+0x42/0xb0
[47609.702393] __ia32_sys_sync+0xe/0x20
[47609.702395] do_syscall_64+0x5a/0x130
[47609.702398] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47609.702399] RIP: 0033:0x7f56f6bd7287
[47609.702402] Code: Bad RIP value.
[47609.702403] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47609.702405] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47609.702406] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47609.702407] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47609.702408] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47609.702409] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47730.534430] INFO: task journal-offline:32058 blocked for more than 120 seconds.
[47730.534481] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47730.534516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.534558] journal-offline D 0 32058 27128 0x00000120
[47730.534561] Call Trace:
[47730.534570] __schedule+0x2b9/0x6c0
[47730.534573] schedule+0x42/0xb0
[47730.534575] schedule_preempt_disabled+0xe/0x10
[47730.534578] __mutex_lock.isra.0+0x182/0x4f0
[47730.534581] __mutex_lock_slowpath+0x13/0x20
[47730.534583] mutex_lock+0x2e/0x40
[47730.534698] zil_commit_impl+0x148/0xd60 [zfs]
[47730.534766] ? zpl_write_common_iovec+0x100/0x100 [zfs]
[47730.534769] ? _cond_resched+0x19/0x30
[47730.534771] ? mutex_lock+0x13/0x40
[47730.534840] zil_commit+0x40/0x60 [zfs]
[47730.534908] zpl_writepages+0xec/0x180 [zfs]
[47730.534912] do_writepages+0x43/0xd0
[47730.534916] __filemap_fdatawrite_range+0xc4/0x100
[47730.534919] filemap_write_and_wait_range+0x36/0x90
[47730.534987] zpl_fsync+0x3d/0xa0 [zfs]
[47730.534991] vfs_fsync_range+0x49/0x80
[47730.534993] ? __fget_light+0x57/0x70
[47730.534995] do_fsync+0x3d/0x70
[47730.534997] __x64_sys_fsync+0x14/0x20
[47730.535001] do_syscall_64+0x5a/0x130
[47730.535003] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535006] RIP: 0033:0x7f872e0af237
[47730.535012] Code: Bad RIP value.
[47730.535014] RSP: 002b:00007f8729766cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[47730.535016] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f872e0af237
[47730.535017] RDX: 0000000000000000 RSI: 00007f872dcb762c RDI: 0000000000000014
[47730.535018] RBP: 00007f872dcb8da0 R08: 00007f8729767700 R09: 00007f8729767700
[47730.535019] R10: 000000000000062c R11: 0000000000000293 R12: 0000000000000002
[47730.535020] R13: 0000000000000000 R14: 00005647819af200 R15: 00007ffce0398f10
[47730.535120] INFO: task pip:19670 blocked for more than 362 seconds.
[47730.535155] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47730.535190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.535230] pip D 0 19670 19606 0x00000120
[47730.535232] Call Trace:
[47730.535235] __schedule+0x2b9/0x6c0
[47730.535238] ? bit_wait_io+0x50/0x50
[47730.535240] schedule+0x42/0xb0
[47730.535242] bit_wait+0x11/0x50
[47730.535243] __wait_on_bit+0x33/0xa0
[47730.535246] __inode_wait_for_writeback+0xae/0xe0
[47730.535249] ? var_wake_function+0x30/0x30
[47730.535252] inode_wait_for_writeback+0x27/0x40
[47730.535254] evict+0xbd/0x1b0
[47730.535256] iput+0x141/0x1f0
[47730.535258] do_unlinkat+0x1c5/0x2d0
[47730.535260] __x64_sys_unlinkat+0x38/0x60
[47730.535262] do_syscall_64+0x5a/0x130
[47730.535265] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535266] RIP: 0033:0x7f7a4ac39d77
[47730.535270] Code: Bad RIP value.
[47730.535271] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47730.535273] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47730.535274] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47730.535275] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47730.535276] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47730.535277] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47730.535280] INFO: task sync:20009 blocked for more than 362 seconds.
[47730.535314] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47730.535349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.535389] sync D 0 20009 20008 0x00004120
[47730.535391] Call Trace:
[47730.535394] __schedule+0x2b9/0x6c0
[47730.535396] schedule+0x42/0xb0
[47730.535398] wb_wait_for_completion+0x64/0x90
[47730.535400] ? wait_woken+0x80/0x80
[47730.535402] sync_inodes_sb+0xd7/0x290
[47730.535405] sync_inodes_one_sb+0x15/0x20
[47730.535407] iterate_supers+0xa3/0x110
[47730.535409] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47730.535411] ksys_sync+0x42/0xb0
[47730.535413] __ia32_sys_sync+0xe/0x20
[47730.535416] do_syscall_64+0x5a/0x130
[47730.535418] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535420] RIP: 0033:0x7f56f6bd7287
[47730.535423] Code: Bad RIP value.
[47730.535424] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47730.535425] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47730.535427] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47730.535428] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47730.535429] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47730.535429] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47851.366840] INFO: task journal-offline:32058 blocked for more than 241 seconds.
[47851.366900] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47851.366944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.366994] journal-offline D 0 32058 27128 0x00000124
[47851.366998] Call Trace:
[47851.367007] __schedule+0x2b9/0x6c0
[47851.367011] schedule+0x42/0xb0
[47851.367014] schedule_preempt_disabled+0xe/0x10
[47851.367017] __mutex_lock.isra.0+0x182/0x4f0
[47851.367021] __mutex_lock_slowpath+0x13/0x20
[47851.367023] mutex_lock+0x2e/0x40
[47851.367145] zil_commit_impl+0x148/0xd60 [zfs]
[47851.367224] ? zpl_write_common_iovec+0x100/0x100 [zfs]
[47851.367227] ? _cond_resched+0x19/0x30
[47851.367230] ? mutex_lock+0x13/0x40
[47851.367309] zil_commit+0x40/0x60 [zfs]
[47851.367387] zpl_writepages+0xec/0x180 [zfs]
[47851.367392] do_writepages+0x43/0xd0
[47851.367396] __filemap_fdatawrite_range+0xc4/0x100
[47851.367400] filemap_write_and_wait_range+0x36/0x90
[47851.367478] zpl_fsync+0x3d/0xa0 [zfs]
[47851.367483] vfs_fsync_range+0x49/0x80
[47851.367486] ? __fget_light+0x57/0x70
[47851.367489] do_fsync+0x3d/0x70
[47851.367491] __x64_sys_fsync+0x14/0x20
[47851.367495] do_syscall_64+0x5a/0x130
[47851.367498] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367501] RIP: 0033:0x7f872e0af237
[47851.367508] Code: Bad RIP value.
[47851.367510] RSP: 002b:00007f8729766cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[47851.367512] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f872e0af237
[47851.367514] RDX: 0000000000000000 RSI: 00007f872dcb762c RDI: 0000000000000014
[47851.367515] RBP: 00007f872dcb8da0 R08: 00007f8729767700 R09: 00007f8729767700
[47851.367516] R10: 000000000000062c R11: 0000000000000293 R12: 0000000000000002
[47851.367518] R13: 0000000000000000 R14: 00005647819af200 R15: 00007ffce0398f10
[47851.367618] INFO: task pip:19670 blocked for more than 483 seconds.
[47851.367660] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47851.367704] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.367753] pip D 0 19670 19606 0x00000120
[47851.367756] Call Trace:
[47851.367760] __schedule+0x2b9/0x6c0
[47851.367764] ? bit_wait_io+0x50/0x50
[47851.367766] schedule+0x42/0xb0
[47851.367769] bit_wait+0x11/0x50
[47851.367771] __wait_on_bit+0x33/0xa0
[47851.367775] __inode_wait_for_writeback+0xae/0xe0
[47851.367779] ? var_wake_function+0x30/0x30
[47851.367782] inode_wait_for_writeback+0x27/0x40
[47851.367785] evict+0xbd/0x1b0
[47851.367787] iput+0x141/0x1f0
[47851.367790] do_unlinkat+0x1c5/0x2d0
[47851.367793] __x64_sys_unlinkat+0x38/0x60
[47851.367795] do_syscall_64+0x5a/0x130
[47851.367799] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367801] RIP: 0033:0x7f7a4ac39d77
[47851.367804] Code: Bad RIP value.
[47851.367806] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47851.367808] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47851.367809] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47851.367811] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47851.367812] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47851.367814] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47851.367817] INFO: task sync:20009 blocked for more than 483 seconds.
[47851.367859] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47851.367902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.367952] sync D 0 20009 20008 0x00004120
[47851.367954] Call Trace:
[47851.367958] __schedule+0x2b9/0x6c0
[47851.367961] schedule+0x42/0xb0
[47851.367963] wb_wait_for_completion+0x64/0x90
[47851.367966] ? wait_woken+0x80/0x80
[47851.367968] sync_inodes_sb+0xd7/0x290
[47851.367972] sync_inodes_one_sb+0x15/0x20
[47851.367975] iterate_supers+0xa3/0x110
[47851.367977] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47851.367980] ksys_sync+0x42/0xb0
[47851.367983] __ia32_sys_sync+0xe/0x20
[47851.367985] do_syscall_64+0x5a/0x130
[47851.367989] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367991] RIP: 0033:0x7f56f6bd7287
[47851.367994] Code: Bad RIP value.
[47851.367995] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47851.367998] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47851.367999] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47851.368000] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47851.368002] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47851.368003] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
I can grab any other debugging information that would be helpful, just let me know what commands to run to grab it.
EDIT: Oh, and I suspect it may have SOMETHING to do with one particular LXC container. The container is stored in a ZFS dataset. Inside the container, the only thing running is Nextcloud, installed from Snapcraft.