Skip to content

Latest XDP fixes #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

michalQb
Copy link

@michalQb michalQb commented Sep 6, 2024

No description provided.

The commit 35d653a ("idpf: implement XDP_SETUP_PROG in ndo_bpf for
splitq") uses soft_reset to perform a full vport reconfiguration after
changes in XDP setup.
Unfortunately, the soft_reset may fail after attaching the XDP program
to the vport. It can happen when the HW limits resources that can be
allocated to fulfill XDP requirements.
In such a case, before we return an error from XDP_SETUP_PROG, we have
to fully restore the previous vport state, including removing the XDP
program.

In order to remove the already loaded XDP program in case of reset
error, re-implement the error handling path and move some calls to the
XDP callback.

Fixes: 35d653a ("idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq")
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
The commit f00334f ("libeth: support native XDP and register memory model")
removed calling of "page_pool_destroy()" from "libeth_rx_fq_destroy()"
and replaced it with the call of "xdp_unreg_page_pool()". There was an
assumption that "page_pool_destroy()" will be called during unregistering
the memory model.
Unfortunately, although "page_pool_destroy()" is called from
"xdp_unreg_mem_model()", it does not perform releasing of all resources.
All page pool resources can be released completely only if the user
reference counter is decremented to zero.

In the libeth scenario, the "user_cnt" page_pool reference counter is
set to 1 during the page_pool creation. Then, it is again incremented to
2 while the memory model is being registered.
Therefore, calling "xdp_unreg_mem_model()" decrements the reference
counter only by one and some page_pool resources are never released.

The page_pool API should be called in a symmetric way, so:
 - each explicit call of "page_pool_create()" should be followed by
   "page_pool_destroy()",
 - each call of "xdp_reg_page_pool()" should be followed by
   "xdp_unreg_page_pool()".

Fix the issue by restoring the call of "page_pool_destroy()" to let the
page_pool to decrement its reference counter back to zero.

Fixes: f00334f ("libeth: support native XDP and register memory model")
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Block the soft reset and vport re-configuration from "idpf_remove()"
context.
The XDP_SETUP_PROG command is normally called while the netdev is being
unregistered. In such a case the IDPF driver should unload the XDP
program and return success. Otherwise, the kernel warning will be shown
in dmesg.

Fixes: 79d940b ("idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq")
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
@michalQb michalQb force-pushed the fixes-sept-6-a-idpf-xdp-libie branch from ec0a1ea to c1d1547 Compare September 18, 2024 14:25
@michalQb michalQb changed the title Fix for reset + fix for PP release Latest XDP fixes Sep 18, 2024
Fixes: 98e4247 ("idpf: prepare structures to support xdp")
Fixes: f12c11e ("idpf: add vc functions to manage selected queues")
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
alobakin pushed a commit that referenced this pull request Oct 15, 2024
Wesley reported an issue:

==================================================================
EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
------------[ cut here ]------------
kernel BUG at fs/ext4/resize.c:324!
CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
RIP: 0010:ext4_resize_fs+0x1212/0x12d0
Call Trace:
 __ext4_ioctl+0x4e0/0x1800
 ext4_ioctl+0x12/0x20
 __x64_sys_ioctl+0x99/0xd0
 x64_sys_call+0x1206/0x20d0
 do_syscall_64+0x72/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================

While reviewing the patch, Honza found that when adjusting resize_bg in
alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than
flexbg_size.

The reproduction of the problem requires the following:

 o_group = flexbg_size * 2 * n;
 o_size = (o_group + 1) * group_size;
 n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
 o_size = (n_group + 1) * group_size;

Take n=0,flexbg_size=16 as an example:

              last:15
|o---------------|--------------n-|
o_group:0    resize to      n_group:30

The corresponding reproducer is:

img=test.img
rm -f $img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 8M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 248M

Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE()
to prevent the issue from happening again.

[ Note: another reproucer which this commit fixes is:

  img=test.img
  rm -f $img
  truncate -s 25MiB $img
  mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img
  truncate -s 3GiB $img
  dev=`losetup -f --show $img`
  mkdir -p /tmp/test
  mount $dev /tmp/test
  resize2fs $dev 3G
  umount $dev
  losetup -d $dev

  -- TYT ]

Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://lore.kernel.org/all/20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com/
Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Eric Sandeen <sandeen@redhat.com>
Fixes: 665d3e0 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240927133329.1015041-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
@alobakin alobakin closed this Jan 30, 2025
alobakin pushed a commit that referenced this pull request Mar 14, 2025
The bnxt_rx_pkt() updates ip_summed value at the end if checksum offload
is enabled.
When the XDP-MB program is attached and it returns XDP_PASS, the
bnxt_xdp_build_skb() is called to update skb_shared_info.
The main purpose of bnxt_xdp_build_skb() is to update skb_shared_info,
but it updates ip_summed value too if checksum offload is enabled.
This is actually duplicate work.

When the bnxt_rx_pkt() updates ip_summed value, it checks if ip_summed
is CHECKSUM_NONE or not.
It means that ip_summed should be CHECKSUM_NONE at this moment.
But ip_summed may already be updated to CHECKSUM_UNNECESSARY in the
XDP-MB-PASS path.
So the by skb_checksum_none_assert() WARNS about it.

This is duplicate work and updating ip_summed in the
bnxt_xdp_build_skb() is not needed.

Splat looks like:
WARNING: CPU: 3 PID: 5782 at ./include/linux/skbuff.h:5155 bnxt_rx_pkt+0x479b/0x7610 [bnxt_en]
Modules linked in: bnxt_re bnxt_en rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_]
CPU: 3 UID: 0 PID: 5782 Comm: socat Tainted: G        W          6.14.0-rc4+ #27
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
RIP: 0010:bnxt_rx_pkt+0x479b/0x7610 [bnxt_en]
Code: 54 24 0c 4c 89 f1 4c 89 ff c1 ea 1f ff d3 0f 1f 00 49 89 c6 48 85 c0 0f 84 4c e5 ff ff 48 89 c7 e8 ca 3d a0 c8 e9 8f f4 ff ff <0f> 0b f
RSP: 0018:ffff88881ba09928 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000c7590303 RCX: 0000000000000000
RDX: 1ffff1104e7d1610 RSI: 0000000000000001 RDI: ffff8881c91300b8
RBP: ffff88881ba09b28 R08: ffff888273e8b0d0 R09: ffff888273e8b070
R10: ffff888273e8b010 R11: ffff888278b0f000 R12: ffff888273e8b080
R13: ffff8881c9130e00 R14: ffff8881505d3800 R15: ffff888273e8b000
FS:  00007f5a2e7be080(0000) GS:ffff88881ba00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff2e708ff8 CR3: 000000013e3b0000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <IRQ>
 ? __warn+0xcd/0x2f0
 ? bnxt_rx_pkt+0x479b/0x7610
 ? report_bug+0x326/0x3c0
 ? handle_bug+0x53/0xa0
 ? exc_invalid_op+0x14/0x50
 ? asm_exc_invalid_op+0x16/0x20
 ? bnxt_rx_pkt+0x479b/0x7610
 ? bnxt_rx_pkt+0x3e41/0x7610
 ? __pfx_bnxt_rx_pkt+0x10/0x10
 ? napi_complete_done+0x2cf/0x7d0
 __bnxt_poll_work+0x4e8/0x1220
 ? __pfx___bnxt_poll_work+0x10/0x10
 ? __pfx_mark_lock.part.0+0x10/0x10
 bnxt_poll_p5+0x36a/0xfa0
 ? __pfx_bnxt_poll_p5+0x10/0x10
 __napi_poll.constprop.0+0xa0/0x440
 net_rx_action+0x899/0xd00
...

Following ping.py patch adds xdp-mb-pass case. so ping.py is going
to be able to reproduce this issue.

Fixes: 1dc4c55 ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250309134219.91670-5-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
alobakin pushed a commit that referenced this pull request Mar 28, 2025
In recent kernels, there are lockdep splats around the
struct request_queue::io_lockdep_map, similar to [1], but they
typically don't show up until reclaim with writeback happens.

Having multiple kernel versions released with a known risc of kernel
deadlock during reclaim writeback should IMHO be addressed and
backported to -stable with the highest priority.

In order to have these lockdep splats show up earlier,
preferrably during system initialization, prime the
struct request_queue::io_lockdep_map as GFP_KERNEL reclaim-
tainted. This will instead lead to lockdep splats looking similar
to [2], but without the need for reclaim + writeback
happening.

[1]:
[  189.762244] ======================================================
[  189.762432] WARNING: possible circular locking dependency detected
[  189.762441] 6.14.0-rc6-xe+ #6 Tainted: G     U
[  189.762450] ------------------------------------------------------
[  189.762459] kswapd0/119 is trying to acquire lock:
[  189.762467] ffff888110ceb710 (&q->q_usage_counter(io)#26){++++}-{0:0}, at: __submit_bio+0x76/0x230
[  189.762485]
               but task is already holding lock:
[  189.762494] ffffffff834c97c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xbe/0xb00
[  189.762507]
               which lock already depends on the new lock.

[  189.762519]
               the existing dependency chain (in reverse order) is:
[  189.762529]
               -> #2 (fs_reclaim){+.+.}-{0:0}:
[  189.762540]        fs_reclaim_acquire+0xc5/0x100
[  189.762548]        kmem_cache_alloc_lru_noprof+0x4a/0x480
[  189.762558]        alloc_inode+0xaa/0xe0
[  189.762566]        iget_locked+0x157/0x330
[  189.762573]        kernfs_get_inode+0x1b/0x110
[  189.762582]        kernfs_get_tree+0x1b0/0x2e0
[  189.762590]        sysfs_get_tree+0x1f/0x60
[  189.762597]        vfs_get_tree+0x2a/0xf0
[  189.762605]        path_mount+0x4cd/0xc00
[  189.762613]        __x64_sys_mount+0x119/0x150
[  189.762621]        x64_sys_call+0x14f2/0x2310
[  189.762630]        do_syscall_64+0x91/0x180
[  189.762637]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  189.762647]
               -> #1 (&root->kernfs_rwsem){++++}-{3:3}:
[  189.762659]        down_write+0x3e/0xf0
[  189.762667]        kernfs_remove+0x32/0x60
[  189.762676]        sysfs_remove_dir+0x4f/0x60
[  189.762685]        __kobject_del+0x33/0xa0
[  189.762709]        kobject_del+0x13/0x30
[  189.762716]        elv_unregister_queue+0x52/0x80
[  189.762725]        elevator_switch+0x68/0x360
[  189.762733]        elv_iosched_store+0x14b/0x1b0
[  189.762756]        queue_attr_store+0x181/0x1e0
[  189.762765]        sysfs_kf_write+0x49/0x80
[  189.762773]        kernfs_fop_write_iter+0x17d/0x250
[  189.762781]        vfs_write+0x281/0x540
[  189.762790]        ksys_write+0x72/0xf0
[  189.762798]        __x64_sys_write+0x19/0x30
[  189.762807]        x64_sys_call+0x2a3/0x2310
[  189.762815]        do_syscall_64+0x91/0x180
[  189.762823]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  189.762833]
               -> #0 (&q->q_usage_counter(io)#26){++++}-{0:0}:
[  189.762845]        __lock_acquire+0x1525/0x2760
[  189.762854]        lock_acquire+0xca/0x310
[  189.762861]        blk_mq_submit_bio+0x8a2/0xba0
[  189.762870]        __submit_bio+0x76/0x230
[  189.762878]        submit_bio_noacct_nocheck+0x323/0x430
[  189.762888]        submit_bio_noacct+0x2cc/0x620
[  189.762896]        submit_bio+0x38/0x110
[  189.762904]        __swap_writepage+0xf5/0x380
[  189.762912]        swap_writepage+0x3c7/0x600
[  189.762920]        shmem_writepage+0x3da/0x4f0
[  189.762929]        pageout+0x13f/0x310
[  189.762937]        shrink_folio_list+0x61c/0xf60
[  189.763261]        evict_folios+0x378/0xcd0
[  189.763584]        try_to_shrink_lruvec+0x1b0/0x360
[  189.763946]        shrink_one+0x10e/0x200
[  189.764266]        shrink_node+0xc02/0x1490
[  189.764586]        balance_pgdat+0x563/0xb00
[  189.764934]        kswapd+0x1e8/0x430
[  189.765249]        kthread+0x10b/0x260
[  189.765559]        ret_from_fork+0x44/0x70
[  189.765889]        ret_from_fork_asm+0x1a/0x30
[  189.766198]
               other info that might help us debug this:

[  189.767089] Chain exists of:
                 &q->q_usage_counter(io)#26 --> &root->kernfs_rwsem --> fs_reclaim

[  189.767971]  Possible unsafe locking scenario:

[  189.768555]        CPU0                    CPU1
[  189.768849]        ----                    ----
[  189.769136]   lock(fs_reclaim);
[  189.769421]                                lock(&root->kernfs_rwsem);
[  189.769714]                                lock(fs_reclaim);
[  189.770016]   rlock(&q->q_usage_counter(io)#26);
[  189.770305]
                *** DEADLOCK ***

[  189.771167] 1 lock held by kswapd0/119:
[  189.771453]  #0: ffffffff834c97c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xbe/0xb00
[  189.771770]
               stack backtrace:
[  189.772351] CPU: 4 UID: 0 PID: 119 Comm: kswapd0 Tainted: G     U             6.14.0-rc6-xe+ #6
[  189.772353] Tainted: [U]=USER
[  189.772354] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[  189.772354] Call Trace:
[  189.772355]  <TASK>
[  189.772356]  dump_stack_lvl+0x6e/0xa0
[  189.772359]  dump_stack+0x10/0x18
[  189.772360]  print_circular_bug.cold+0x17a/0x1b7
[  189.772363]  check_noncircular+0x13a/0x150
[  189.772365]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[  189.772368]  __lock_acquire+0x1525/0x2760
[  189.772368]  ? ret_from_fork_asm+0x1a/0x30
[  189.772371]  lock_acquire+0xca/0x310
[  189.772372]  ? __submit_bio+0x76/0x230
[  189.772375]  ? lock_release+0xd5/0x2c0
[  189.772376]  blk_mq_submit_bio+0x8a2/0xba0
[  189.772378]  ? __submit_bio+0x76/0x230
[  189.772380]  __submit_bio+0x76/0x230
[  189.772382]  ? trace_hardirqs_on+0x1e/0xe0
[  189.772384]  submit_bio_noacct_nocheck+0x323/0x430
[  189.772386]  ? submit_bio_noacct_nocheck+0x323/0x430
[  189.772387]  ? __might_sleep+0x58/0xa0
[  189.772390]  submit_bio_noacct+0x2cc/0x620
[  189.772391]  ? count_memcg_events+0x68/0x90
[  189.772393]  submit_bio+0x38/0x110
[  189.772395]  __swap_writepage+0xf5/0x380
[  189.772396]  swap_writepage+0x3c7/0x600
[  189.772397]  shmem_writepage+0x3da/0x4f0
[  189.772401]  pageout+0x13f/0x310
[  189.772406]  shrink_folio_list+0x61c/0xf60
[  189.772409]  ? isolate_folios+0xe80/0x16b0
[  189.772410]  ? mark_held_locks+0x46/0x90
[  189.772412]  evict_folios+0x378/0xcd0
[  189.772414]  ? evict_folios+0x34a/0xcd0
[  189.772415]  ? lock_is_held_type+0xa3/0x130
[  189.772417]  try_to_shrink_lruvec+0x1b0/0x360
[  189.772420]  shrink_one+0x10e/0x200
[  189.772421]  shrink_node+0xc02/0x1490
[  189.772423]  ? shrink_node+0xa08/0x1490
[  189.772424]  ? shrink_node+0xbd8/0x1490
[  189.772425]  ? mem_cgroup_iter+0x366/0x480
[  189.772427]  balance_pgdat+0x563/0xb00
[  189.772428]  ? balance_pgdat+0x563/0xb00
[  189.772430]  ? trace_hardirqs_on+0x1e/0xe0
[  189.772431]  ? finish_task_switch.isra.0+0xcb/0x330
[  189.772433]  ? __switch_to_asm+0x33/0x70
[  189.772437]  kswapd+0x1e8/0x430
[  189.772438]  ? __pfx_autoremove_wake_function+0x10/0x10
[  189.772440]  ? __pfx_kswapd+0x10/0x10
[  189.772441]  kthread+0x10b/0x260
[  189.772443]  ? __pfx_kthread+0x10/0x10
[  189.772444]  ret_from_fork+0x44/0x70
[  189.772446]  ? __pfx_kthread+0x10/0x10
[  189.772447]  ret_from_fork_asm+0x1a/0x30
[  189.772450]  </TASK>

[2]:
[    8.760253] ======================================================
[    8.760254] WARNING: possible circular locking dependency detected
[    8.760255] 6.14.0-rc6-xe+ #7 Tainted: G     U
[    8.760256] ------------------------------------------------------
[    8.760257] (udev-worker)/674 is trying to acquire lock:
[    8.760259] ffff888100e39148 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_remove+0x32/0x60
[    8.760265]
               but task is already holding lock:
[    8.760266] ffff888110dc7680 (&q->q_usage_counter(io)#27){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x30
[    8.760272]
               which lock already depends on the new lock.

[    8.760272]
               the existing dependency chain (in reverse order) is:
[    8.760273]
               -> #2 (&q->q_usage_counter(io)#27){++++}-{0:0}:
[    8.760276]        blk_alloc_queue+0x30a/0x350
[    8.760279]        blk_mq_alloc_queue+0x6b/0xe0
[    8.760281]        scsi_alloc_sdev+0x276/0x3c0
[    8.760284]        scsi_probe_and_add_lun+0x22a/0x440
[    8.760286]        __scsi_scan_target+0x109/0x230
[    8.760288]        scsi_scan_channel+0x65/0xc0
[    8.760290]        scsi_scan_host_selected+0xff/0x140
[    8.760292]        do_scsi_scan_host+0xa7/0xc0
[    8.760293]        do_scan_async+0x1c/0x160
[    8.760295]        async_run_entry_fn+0x32/0x150
[    8.760299]        process_one_work+0x224/0x5f0
[    8.760302]        worker_thread+0x1d4/0x3e0
[    8.760304]        kthread+0x10b/0x260
[    8.760306]        ret_from_fork+0x44/0x70
[    8.760309]        ret_from_fork_asm+0x1a/0x30
[    8.760312]
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[    8.760315]        fs_reclaim_acquire+0xc5/0x100
[    8.760317]        kmem_cache_alloc_lru_noprof+0x4a/0x480
[    8.760319]        alloc_inode+0xaa/0xe0
[    8.760322]        iget_locked+0x157/0x330
[    8.760323]        kernfs_get_inode+0x1b/0x110
[    8.760325]        kernfs_get_tree+0x1b0/0x2e0
[    8.760327]        sysfs_get_tree+0x1f/0x60
[    8.760329]        vfs_get_tree+0x2a/0xf0
[    8.760332]        path_mount+0x4cd/0xc00
[    8.760334]        __x64_sys_mount+0x119/0x150
[    8.760336]        x64_sys_call+0x14f2/0x2310
[    8.760338]        do_syscall_64+0x91/0x180
[    8.760340]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    8.760342]
               -> #0 (&root->kernfs_rwsem){++++}-{3:3}:
[    8.760345]        __lock_acquire+0x1525/0x2760
[    8.760347]        lock_acquire+0xca/0x310
[    8.760348]        down_write+0x3e/0xf0
[    8.760350]        kernfs_remove+0x32/0x60
[    8.760351]        sysfs_remove_dir+0x4f/0x60
[    8.760353]        __kobject_del+0x33/0xa0
[    8.760355]        kobject_del+0x13/0x30
[    8.760356]        elv_unregister_queue+0x52/0x80
[    8.760358]        elevator_switch+0x68/0x360
[    8.760360]        elv_iosched_store+0x14b/0x1b0
[    8.760362]        queue_attr_store+0x181/0x1e0
[    8.760364]        sysfs_kf_write+0x49/0x80
[    8.760366]        kernfs_fop_write_iter+0x17d/0x250
[    8.760367]        vfs_write+0x281/0x540
[    8.760370]        ksys_write+0x72/0xf0
[    8.760372]        __x64_sys_write+0x19/0x30
[    8.760374]        x64_sys_call+0x2a3/0x2310
[    8.760376]        do_syscall_64+0x91/0x180
[    8.760377]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    8.760380]
               other info that might help us debug this:

[    8.760380] Chain exists of:
                 &root->kernfs_rwsem --> fs_reclaim --> &q->q_usage_counter(io)#27

[    8.760384]  Possible unsafe locking scenario:

[    8.760384]        CPU0                    CPU1
[    8.760385]        ----                    ----
[    8.760385]   lock(&q->q_usage_counter(io)#27);
[    8.760387]                                lock(fs_reclaim);
[    8.760388]                                lock(&q->q_usage_counter(io)#27);
[    8.760390]   lock(&root->kernfs_rwsem);
[    8.760391]
                *** DEADLOCK ***

[    8.760391] 6 locks held by (udev-worker)/674:
[    8.760392]  #0: ffff8881209ac420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x72/0xf0
[    8.760398]  #1: ffff88810c80f488 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x136/0x250
[    8.760402]  #2: ffff888125d1d330 (kn->active#101){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x13f/0x250
[    8.760406]  #3: ffff888110dc7bb0 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x148/0x1e0
[    8.760411]  #4: ffff888110dc7680 (&q->q_usage_counter(io)#27){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x30
[    8.760416]  #5: ffff888110dc76b8 (&q->q_usage_counter(queue)#27){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x30
[    8.760421]
               stack backtrace:
[    8.760422] CPU: 7 UID: 0 PID: 674 Comm: (udev-worker) Tainted: G     U             6.14.0-rc6-xe+ #7
[    8.760424] Tainted: [U]=USER
[    8.760425] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[    8.760426] Call Trace:
[    8.760427]  <TASK>
[    8.760428]  dump_stack_lvl+0x6e/0xa0
[    8.760431]  dump_stack+0x10/0x18
[    8.760433]  print_circular_bug.cold+0x17a/0x1b7
[    8.760437]  check_noncircular+0x13a/0x150
[    8.760441]  ? save_trace+0x54/0x360
[    8.760445]  __lock_acquire+0x1525/0x2760
[    8.760446]  ? irqentry_exit+0x3a/0xb0
[    8.760448]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[    8.760452]  lock_acquire+0xca/0x310
[    8.760453]  ? kernfs_remove+0x32/0x60
[    8.760457]  down_write+0x3e/0xf0
[    8.760459]  ? kernfs_remove+0x32/0x60
[    8.760460]  kernfs_remove+0x32/0x60
[    8.760462]  sysfs_remove_dir+0x4f/0x60
[    8.760464]  __kobject_del+0x33/0xa0
[    8.760466]  kobject_del+0x13/0x30
[    8.760467]  elv_unregister_queue+0x52/0x80
[    8.760470]  elevator_switch+0x68/0x360
[    8.760472]  elv_iosched_store+0x14b/0x1b0
[    8.760475]  queue_attr_store+0x181/0x1e0
[    8.760479]  ? lock_acquire+0xca/0x310
[    8.760480]  ? kernfs_fop_write_iter+0x13f/0x250
[    8.760482]  ? lock_is_held_type+0xa3/0x130
[    8.760485]  sysfs_kf_write+0x49/0x80
[    8.760487]  kernfs_fop_write_iter+0x17d/0x250
[    8.760489]  vfs_write+0x281/0x540
[    8.760494]  ksys_write+0x72/0xf0
[    8.760497]  __x64_sys_write+0x19/0x30
[    8.760499]  x64_sys_call+0x2a3/0x2310
[    8.760502]  do_syscall_64+0x91/0x180
[    8.760504]  ? trace_hardirqs_off+0x5d/0xe0
[    8.760506]  ? handle_softirqs+0x479/0x4d0
[    8.760508]  ? hrtimer_interrupt+0x13f/0x280
[    8.760511]  ? irqentry_exit_to_user_mode+0x8b/0x260
[    8.760513]  ? clear_bhb_loop+0x15/0x70
[    8.760515]  ? clear_bhb_loop+0x15/0x70
[    8.760516]  ? clear_bhb_loop+0x15/0x70
[    8.760518]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    8.760520] RIP: 0033:0x7aa3bf2f5504
[    8.760522] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[    8.760523] RSP: 002b:00007ffc1e3697d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[    8.760526] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007aa3bf2f5504
[    8.760527] RDX: 0000000000000003 RSI: 00007ffc1e369ae0 RDI: 000000000000001c
[    8.760528] RBP: 00007ffc1e369800 R08: 00007aa3bf3f51c8 R09: 00007ffc1e3698b0
[    8.760528] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[    8.760529] R13: 00007ffc1e369ae0 R14: 0000613ccf21f2f0 R15: 00007aa3bf3f4e80
[    8.760533]  </TASK>

v2:
- Update a code comment to increase readability (Ming Lei).

Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250318095548.5187-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants