-
Notifications
You must be signed in to change notification settings - Fork 24
Create meson-g12b-ugoos-am6b-plus-by-vuong-vn.dts #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks, but:
I suggest you identify differences between the existing AM6 device and AM6B+ device, then create a new/clean device-tree file based on the existing w400 dtsi (the approach used in the existing AM6 dts) to avoid duplication. NB: I will be happy to backport a device-tree that has been submitted to the kernel. I have no interest in adding devices to my tree that will not be upstreamed (as I do not have the device, and cannot test it). |
[ Upstream commit 3c46372 ] This lockdep splat says it better than I could: ================================ WARNING: inconsistent lock state 6.2.0-rc2-07010-ga9b9500ffaac-dirty torvalds#967 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0 {IN-SOFTIRQ-W} state was registered at: _raw_spin_lock+0x5c/0xc0 sch_direct_xmit+0x148/0x37c __dev_queue_xmit+0x528/0x111c ip6_finish_output2+0x5ec/0xb7c ip6_finish_output+0x240/0x3f0 ip6_output+0x78/0x360 ndisc_send_skb+0x33c/0x85c ndisc_send_rs+0x54/0x12c addrconf_rs_timer+0x154/0x260 call_timer_fn+0xb8/0x3a0 __run_timers.part.0+0x214/0x26c run_timer_softirq+0x3c/0x74 __do_softirq+0x14c/0x5d8 ____do_softirq+0x10/0x20 call_on_irq_stack+0x2c/0x5c do_softirq_own_stack+0x1c/0x30 __irq_exit_rcu+0x168/0x1a0 irq_exit_rcu+0x10/0x40 el1_interrupt+0x38/0x64 irq event stamp: 7825 hardirqs last enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130 hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8 softirqs last enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8 softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(_xmit_ETHER#2); <Interrupt> lock(_xmit_ETHER#2); *** DEADLOCK *** 3 locks held by kworker/1:3/179: #0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0 #1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0 #2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34 Workqueue: events enetc_tx_onestep_tstamp Call trace: print_usage_bug.part.0+0x208/0x22c mark_lock+0x7f0/0x8b0 __lock_acquire+0x7c4/0x1ce0 lock_acquire.part.0+0xe0/0x220 lock_acquire+0x68/0x84 _raw_spin_lock+0x5c/0xc0 netif_freeze_queues+0x5c/0xc0 netif_tx_lock+0x24/0x34 enetc_tx_onestep_tstamp+0x20/0x100 process_one_work+0x28c/0x6c0 worker_thread+0x74/0x450 kthread+0x118/0x11c but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in process context, therefore with softirqs enabled (i.o.w., it can be interrupted by a softirq). If we hold the netif_tx_lock() when there is an interrupt, and the NET_TX softirq then gets scheduled, this will take the netif_tx_lock() a second time and deadlock the kernel. To solve this, use netif_tx_lock_bh(), which blocks softirqs from running. Fixes: 7294380 ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230112105440.1786799-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 241f519 ] This attempts to avoid circular locking dependency between sock_lock and hdev_lock: WARNING: possible circular locking dependency detected 6.0.0-rc7-03728-g18dd8ab0a783 #3 Not tainted ------------------------------------------------------ kworker/u3:2/53 is trying to acquire lock: ffff888000254130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_conn_del+0xbd/0x1d0 but task is already holding lock: ffffffff9f39a080 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_le_cis_estabilished_evt+0x1b5/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (hci_cb_list_lock){+.+.}-{3:3}: __mutex_lock+0x10e/0xfe0 hci_le_remote_feat_complete_evt+0x17f/0x320 hci_event_packet+0x39c/0x7d0 hci_rx_work+0x2bf/0x950 process_one_work+0x569/0x980 worker_thread+0x2a3/0x6f0 kthread+0x153/0x180 ret_from_fork+0x22/0x30 -> #1 (&hdev->lock){+.+.}-{3:3}: __mutex_lock+0x10e/0xfe0 iso_connect_cis+0x6f/0x5a0 iso_sock_connect+0x1af/0x710 __sys_connect+0x17e/0x1b0 __x64_sys_connect+0x37/0x50 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: __lock_acquire+0x1b51/0x33d0 lock_acquire+0x16f/0x3b0 lock_sock_nested+0x32/0x80 iso_conn_del+0xbd/0x1d0 iso_connect_cfm+0x226/0x680 hci_le_cis_estabilished_evt+0x1ed/0x500 hci_event_packet+0x39c/0x7d0 hci_rx_work+0x2bf/0x950 process_one_work+0x569/0x980 worker_thread+0x2a3/0x6f0 kthread+0x153/0x180 ret_from_fork+0x22/0x30 other info that might help us debug this: Chain exists of: sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> &hdev->lock --> hci_cb_list_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hci_cb_list_lock); lock(&hdev->lock); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); *** DEADLOCK *** 4 locks held by kworker/u3:2/53: #0: ffff8880021d9130 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x4ad/0x980 #1: ffff888002387de0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x4ad/0x980 #2: ffff888001ac0070 (&hdev->lock){+.+.}-{3:3}, at: hci_le_cis_estabilished_evt+0xc3/0x500 #3: ffffffff9f39a080 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_le_cis_estabilished_evt+0x1b5/0x500 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 6a5ad25 ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a5ad25 ] This attempts to fix the following trace: kworker/u3:1/184 is trying to acquire lock: ffff888001888130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_connect_cfm+0x2de/0x690 but task is already holding lock: ffff8880028d1c20 (&conn->lock){+.+.}-{2:2}, at: iso_connect_cfm+0x265/0x690 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&conn->lock){+.+.}-{2:2}: lock_acquire+0x176/0x3d0 _raw_spin_lock+0x2a/0x40 __iso_sock_close+0x1dd/0x4f0 iso_sock_release+0xa0/0x1b0 sock_close+0x5e/0x120 __fput+0x102/0x410 task_work_run+0xf1/0x160 exit_to_user_mode_prepare+0x170/0x180 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x2de/0x690 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); *** DEADLOCK *** Fixes: ccf74f2 ("Bluetooth: Add BTPROTO_ISO socket type") Fixes: f764a6c ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e9d50f7 ] This fixes the following trace caused by attempting to lock cmd_sync_work_lock while holding the rcu_read_lock: kworker/u3:2/212 is trying to lock: ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at: hci_cmd_sync_queue+0xad/0x140 other info that might help us debug this: context-{4:4} 4 locks held by kworker/u3:2/212: #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at: hci_cc_le_set_cig_params+0x64/0x4f0 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at: hci_cc_le_set_cig_params+0x2f9/0x4f0 Fixes: 26afbd8 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5aa5610 ] The cited commit changed class of tc_ht internal mutex in order to avoid false lock dependency with fs_core node and flow_table hash table structures. However, hash table implementation internally also includes a workqueue task with its own lockdep map which causes similar bogus lockdep splat[0]. Fix it by also adding dedicated class for hash table workqueue work structure of tc_ht. [0]: [ 1139.672465] ====================================================== [ 1139.673552] WARNING: possible circular locking dependency detected [ 1139.674635] 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 Not tainted [ 1139.675734] ------------------------------------------------------ [ 1139.676801] modprobe/5998 is trying to acquire lock: [ 1139.677726] ffff88811e7b93b8 (&node->lock){++++}-{3:3}, at: down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.679662] but task is already holding lock: [ 1139.680703] ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0 [ 1139.682223] which lock already depends on the new lock. [ 1139.683640] the existing dependency chain (in reverse order) is: [ 1139.684887] -> #2 (&tc_ht_lock_key){+.+.}-{3:3}: [ 1139.685975] __mutex_lock+0x12c/0x14b0 [ 1139.686659] rht_deferred_worker+0x35/0x1540 [ 1139.687405] process_one_work+0x7c2/0x1310 [ 1139.688134] worker_thread+0x59d/0xec0 [ 1139.688820] kthread+0x28f/0x330 [ 1139.689444] ret_from_fork+0x1f/0x30 [ 1139.690106] -> #1 ((work_completion)(&ht->run_work)){+.+.}-{0:0}: [ 1139.691250] __flush_work+0xe8/0x900 [ 1139.691915] __cancel_work_timer+0x2ca/0x3f0 [ 1139.692655] rhashtable_free_and_destroy+0x22/0x6f0 [ 1139.693472] del_sw_flow_table+0x22/0xb0 [mlx5_core] [ 1139.694592] tree_put_node+0x24c/0x450 [mlx5_core] [ 1139.695686] tree_remove_node+0x6e/0x100 [mlx5_core] [ 1139.696803] mlx5_destroy_flow_table+0x187/0x690 [mlx5_core] [ 1139.698017] mlx5e_tc_nic_cleanup+0x2f8/0x400 [mlx5_core] [ 1139.699217] mlx5e_cleanup_nic_rx+0x2b/0x210 [mlx5_core] [ 1139.700397] mlx5e_detach_netdev+0x19d/0x2b0 [mlx5_core] [ 1139.701571] mlx5e_suspend+0xdb/0x140 [mlx5_core] [ 1139.702665] mlx5e_remove+0x89/0x190 [mlx5_core] [ 1139.703756] auxiliary_bus_remove+0x52/0x70 [ 1139.704492] device_release_driver_internal+0x3c1/0x600 [ 1139.705360] bus_remove_device+0x2a5/0x560 [ 1139.706080] device_del+0x492/0xb80 [ 1139.706724] mlx5_rescan_drivers_locked+0x194/0x6a0 [mlx5_core] [ 1139.707961] mlx5_unregister_device+0x7a/0xa0 [mlx5_core] [ 1139.709138] mlx5_uninit_one+0x5f/0x160 [mlx5_core] [ 1139.710252] remove_one+0xd1/0x160 [mlx5_core] [ 1139.711297] pci_device_remove+0x96/0x1c0 [ 1139.722721] device_release_driver_internal+0x3c1/0x600 [ 1139.723590] unbind_store+0x1b1/0x200 [ 1139.724259] kernfs_fop_write_iter+0x348/0x520 [ 1139.725019] vfs_write+0x7b2/0xbf0 [ 1139.725658] ksys_write+0xf3/0x1d0 [ 1139.726292] do_syscall_64+0x3d/0x90 [ 1139.726942] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.727769] -> #0 (&node->lock){++++}-{3:3}: [ 1139.728698] __lock_acquire+0x2cf5/0x62f0 [ 1139.729415] lock_acquire+0x1c1/0x540 [ 1139.730076] down_write+0x8e/0x1f0 [ 1139.730709] down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.731841] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core] [ 1139.732982] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core] [ 1139.734207] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core] [ 1139.735491] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core] [ 1139.736716] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core] [ 1139.738007] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core] [ 1139.739213] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core] [ 1139.740377] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core] [ 1139.741534] rhashtable_free_and_destroy+0x3be/0x6f0 [ 1139.742351] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core] [ 1139.743512] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core] [ 1139.744683] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] [ 1139.745860] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core] [ 1139.747098] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] [ 1139.748372] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core] [ 1139.749590] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core] [ 1139.750813] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core] [ 1139.752147] mlx5e_rep_remove+0x62/0x80 [mlx5_core] [ 1139.753293] auxiliary_bus_remove+0x52/0x70 [ 1139.754028] device_release_driver_internal+0x3c1/0x600 [ 1139.754885] driver_detach+0xc1/0x180 [ 1139.755553] bus_remove_driver+0xef/0x2e0 [ 1139.756260] auxiliary_driver_unregister+0x16/0x50 [ 1139.757059] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core] [ 1139.758207] mlx5e_cleanup+0x12/0x30 [mlx5_core] [ 1139.759295] mlx5_cleanup+0xc/0x49 [mlx5_core] [ 1139.760384] __x64_sys_delete_module+0x2b5/0x450 [ 1139.761166] do_syscall_64+0x3d/0x90 [ 1139.761827] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.762663] other info that might help us debug this: [ 1139.763925] Chain exists of: &node->lock --> (work_completion)(&ht->run_work) --> &tc_ht_lock_key [ 1139.765743] Possible unsafe locking scenario: [ 1139.766688] CPU0 CPU1 [ 1139.767399] ---- ---- [ 1139.768111] lock(&tc_ht_lock_key); [ 1139.768704] lock((work_completion)(&ht->run_work)); [ 1139.769869] lock(&tc_ht_lock_key); [ 1139.770770] lock(&node->lock); [ 1139.771326] *** DEADLOCK *** [ 1139.772345] 2 locks held by modprobe/5998: [ 1139.772994] #0: ffff88813c1ff0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 1139.774399] #1: ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0 [ 1139.775822] stack backtrace: [ 1139.776579] CPU: 3 PID: 5998 Comm: modprobe Not tainted 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 [ 1139.777935] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1139.779529] Call Trace: [ 1139.779992] <TASK> [ 1139.780409] dump_stack_lvl+0x57/0x7d [ 1139.781015] check_noncircular+0x278/0x300 [ 1139.781687] ? print_circular_bug+0x460/0x460 [ 1139.782381] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1139.783121] ? lock_release+0x487/0x7c0 [ 1139.783759] ? orc_find.part.0+0x1f1/0x330 [ 1139.784423] ? mark_lock.part.0+0xef/0x2fc0 [ 1139.785091] __lock_acquire+0x2cf5/0x62f0 [ 1139.785754] ? register_lock_class+0x18e0/0x18e0 [ 1139.786483] lock_acquire+0x1c1/0x540 [ 1139.787093] ? down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.788195] ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 [ 1139.788978] ? register_lock_class+0x18e0/0x18e0 [ 1139.789715] down_write+0x8e/0x1f0 [ 1139.790292] ? down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.791380] ? down_write_killable+0x220/0x220 [ 1139.792080] ? find_held_lock+0x2d/0x110 [ 1139.792713] down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.793795] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core] [ 1139.794879] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core] [ 1139.796032] ? __esw_offloads_unload_rep+0xd0/0xd0 [mlx5_core] [ 1139.797227] ? xa_load+0x11a/0x200 [ 1139.797800] ? __xa_clear_mark+0xf0/0xf0 [ 1139.798438] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core] [ 1139.799660] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core] [ 1139.800821] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core] [ 1139.802049] ? mlx5_eswitch_get_uplink_priv+0x25/0x80 [mlx5_core] [ 1139.803260] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core] [ 1139.804398] ? __cancel_work_timer+0x1c2/0x3f0 [ 1139.805099] ? mlx5e_tc_unoffload_from_slow_path+0x460/0x460 [mlx5_core] [ 1139.806387] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core] [ 1139.807481] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core] [ 1139.808564] rhashtable_free_and_destroy+0x3be/0x6f0 [ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core] [ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core] [ 1139.810455] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core] [ 1139.811552] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core] [ 1139.812655] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] [ 1139.813768] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core] [ 1139.814952] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] [ 1139.816166] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core] [ 1139.817336] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core] [ 1139.818507] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core] [ 1139.819788] ? mlx5_eswitch_uplink_get_proto_dev+0x30/0x30 [mlx5_core] [ 1139.821051] ? kernfs_find_ns+0x137/0x310 [ 1139.821705] mlx5e_rep_remove+0x62/0x80 [mlx5_core] [ 1139.822778] auxiliary_bus_remove+0x52/0x70 [ 1139.823449] device_release_driver_internal+0x3c1/0x600 [ 1139.824240] driver_detach+0xc1/0x180 [ 1139.824842] bus_remove_driver+0xef/0x2e0 [ 1139.825504] auxiliary_driver_unregister+0x16/0x50 [ 1139.826245] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core] [ 1139.827322] mlx5e_cleanup+0x12/0x30 [mlx5_core] [ 1139.828345] mlx5_cleanup+0xc/0x49 [mlx5_core] [ 1139.829382] __x64_sys_delete_module+0x2b5/0x450 [ 1139.830119] ? module_flags+0x300/0x300 [ 1139.830750] ? task_work_func_match+0x50/0x50 [ 1139.831440] ? task_work_cancel+0x20/0x20 [ 1139.832088] ? lockdep_hardirqs_on_prepare+0x273/0x3f0 [ 1139.832873] ? syscall_enter_from_user_mode+0x1d/0x50 [ 1139.833661] ? trace_hardirqs_on+0x2d/0x100 [ 1139.834328] do_syscall_64+0x3d/0x90 [ 1139.834922] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.835700] RIP: 0033:0x7f153e71288b [ 1139.836302] Code: 73 01 c3 48 8b 0d 9d 75 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 75 0e 00 f7 d8 64 89 01 48 [ 1139.838866] RSP: 002b:00007ffe0a3ed938 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1139.840020] RAX: ffffffffffffffda RBX: 0000564c2cbf8220 RCX: 00007f153e71288b [ 1139.841043] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564c2cbf8288 [ 1139.842072] RBP: 0000564c2cbf8220 R08: 0000000000000000 R09: 0000000000000000 [ 1139.843094] R10: 00007f153e7a3ac0 R11: 0000000000000206 R12: 0000564c2cbf8288 [ 1139.844118] R13: 0000000000000000 R14: 0000564c2cbf7ae8 R15: 00007ffe0a3efcb8 Fixes: 9ba3333 ("net/mlx5e: Avoid false lock depenency warning on tc_ht") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3dc1b3 ] The first time dma_chan_get() is called for a channel the channel client_count is incorrectly incremented twice for public channels, first in balance_ref_count(), and again prior to returning. This results in an incorrect client count which will lead to the channel resources not being freed when they should be. A simple test of repeated module load and unload of async_tx on a Dell Power Edge R7425 also shows this resulting in a kref underflow warning. [ 124.329662] async_tx: api initialized (async) [ 129.000627] async_tx: api initialized (async) [ 130.047839] ------------[ cut here ]------------ [ 130.052472] refcount_t: underflow; use-after-free. [ 130.057279] WARNING: CPU: 3 PID: 19364 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 130.065811] Modules linked in: async_tx(-) rfkill intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd ipmi_ssif kvm_amd dcdbas kvm mgag200 drm_shmem_helper acpi_ipmi irqbypass drm_kms_helper ipmi_si syscopyarea sysfillrect rapl pcspkr ipmi_devintf sysimgblt fb_sys_fops k10temp i2c_piix4 ipmi_msghandler acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs libcrc32c sd_mod t10_pi sg ahci crct10dif_pclmul libahci crc32_pclmul crc32c_intel ghash_clmulni_intel igb megaraid_sas i40e libata i2c_algo_bit ccp sp5100_tco dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: async_tx] [ 130.117361] CPU: 3 PID: 19364 Comm: modprobe Kdump: loaded Not tainted 5.14.0-185.el9.x86_64 #1 [ 130.126091] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.18.0 01/17/2022 [ 130.133806] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 130.139041] Code: 01 01 e8 6d bd 55 00 0f 0b e9 72 9d 8a 00 80 3d 26 18 9c 01 00 75 85 48 c7 c7 f8 a3 03 9d c6 05 16 18 9c 01 01 e8 4a bd 55 00 <0f> 0b e9 4f 9d 8a 00 80 3d 01 18 9c 01 00 0f 85 5e ff ff ff 48 c7 [ 130.157807] RSP: 0018:ffffbf98898afe68 EFLAGS: 00010286 [ 130.163036] RAX: 0000000000000000 RBX: ffff9da06028e598 RCX: 0000000000000000 [ 130.170172] RDX: ffff9daf9de26480 RSI: ffff9daf9de198a0 RDI: ffff9daf9de198a0 [ 130.177316] RBP: ffff9da7cddf3970 R08: 0000000000000000 R09: 00000000ffff7fff [ 130.184459] R10: ffffbf98898afd00 R11: ffffffff9d9e8c28 R12: ffff9da7cddf1970 [ 130.191596] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 130.198739] FS: 00007f646435c740(0000) GS:ffff9daf9de00000(0000) knlGS:0000000000000000 [ 130.206832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 130.212586] CR2: 00007f6463b214f0 CR3: 00000008ab98c000 CR4: 00000000003506e0 [ 130.219729] Call Trace: [ 130.222192] <TASK> [ 130.224305] dma_chan_put+0x10d/0x110 [ 130.227988] dmaengine_put+0x7a/0xa0 [ 130.231575] __do_sys_delete_module.constprop.0+0x178/0x280 [ 130.237157] ? syscall_trace_enter.constprop.0+0x145/0x1d0 [ 130.242652] do_syscall_64+0x5c/0x90 [ 130.246240] ? exc_page_fault+0x62/0x150 [ 130.250178] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 130.255243] RIP: 0033:0x7f6463a3f5ab [ 130.258830] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 [ 130.277591] RSP: 002b:00007fff22f972c8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 130.285164] RAX: ffffffffffffffda RBX: 000055b6786edd40 RCX: 00007f6463a3f5ab [ 130.292303] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b6786edda8 [ 130.299443] RBP: 000055b6786edd40 R08: 0000000000000000 R09: 0000000000000000 [ 130.306584] R10: 00007f6463b9eac0 R11: 0000000000000206 R12: 000055b6786edda8 [ 130.313731] R13: 0000000000000000 R14: 000055b6786edda8 R15: 00007fff22f995f8 [ 130.320875] </TASK> [ 130.323081] ---[ end trace eff7156d56b5cf25 ]--- cat /sys/class/dma/dma0chan*/in_use would get the wrong result. 2 2 2 Fixes: d2f4f99 ("dmaengine: Rework dma_chan_get") Signed-off-by: Koba Ko <koba.ko@canonical.com> Reviewed-by: Jie Hai <haijie1@huawei.com> Test-by: Jie Hai <haijie1@huawei.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/r/20221201030050.978595-1-koba.ko@canonical.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit ba81043 upstream. There is a lock inversion and rwsem read-lock recursion in the devfreq target callback which can lead to deadlocks. Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock read lock when toggling the write booster, which involves taking the dev_cmd mutex before taking another clk_scaling_lock read lock. This can lead to a deadlock if another thread: 1) tries to acquire the dev_cmd and clk_scaling locks in the correct order, or 2) takes a clk_scaling write lock before the attempt to take the clk_scaling read lock a second time. Fix this by dropping the clk_scaling_lock before toggling the write booster as was done before commit 0e9d4ca ("scsi: ufs: Protect some contexts from unexpected clock scaling"). While the devfreq callbacks are already serialised, add a second serialising mutex to handle the unlikely case where a callback triggered through the devfreq sysfs interface is racing with a request to disable clock scaling through the UFS controller 'clkscale_enable' sysfs attribute. This could otherwise lead to the write booster being left disabled after having disabled clock scaling. Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any pending write booster update has completed on return. Note that this currently only affects Qualcomm platforms since commit 87bd050 ("scsi: ufs: core: Allow host driver to disable wb toggling during clock scaling"). The lock inversion (i.e. 1 above) was reported by lockdep as: ====================================================== WARNING: possible circular locking dependency detected 6.1.0-next-20221216 torvalds#211 Not tainted ------------------------------------------------------ kworker/u16:2/71 is trying to acquire lock: ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0 but task is already holding lock: ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380 which lock already depends on the new lock. [ +0.011606] the existing dependency chain (in reverse order) is: -> #1 (&hba->clk_scaling_lock){++++}-{3:3}: lock_acquire+0x68/0x90 down_read+0x58/0x80 ufshcd_exec_dev_cmd+0x70/0x2c0 ufshcd_verify_dev_init+0x68/0x170 ufshcd_probe_hba+0x398/0x1180 ufshcd_async_scan+0x30/0x320 async_run_entry_fn+0x34/0x150 process_one_work+0x288/0x6c0 worker_thread+0x74/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x20 -> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}: __lock_acquire+0x12a0/0x2240 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x90 __mutex_lock+0x98/0x430 mutex_lock_nested+0x2c/0x40 ufshcd_query_flag+0x50/0x1c0 ufshcd_query_flag_retry+0x64/0x100 ufshcd_wb_toggle+0x5c/0x120 ufshcd_devfreq_scale+0x2c4/0x380 ufshcd_devfreq_target+0xf4/0x230 devfreq_set_target+0x84/0x2f0 devfreq_update_target+0xc4/0xf0 devfreq_monitor+0x38/0x1f0 process_one_work+0x288/0x6c0 worker_thread+0x74/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&hba->clk_scaling_lock); lock(&hba->dev_cmd.lock); lock(&hba->clk_scaling_lock); lock(&hba->dev_cmd.lock); *** DEADLOCK *** Fixes: 0e9d4ca ("scsi: ufs: Protect some contexts from unexpected clock scaling") Cc: stable@vger.kernel.org # 5.12 Cc: Can Guo <quic_cang@quicinc.com> Tested-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230116161201.16923-1-johan+linaro@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e38553b ] The page_pool_release_page was used when freeing rx buffers, and this function just unmaps the page (if mapped) and does not recycle the page. So after hundreds of down/up the eth0, the system will out of memory. For more details, please refer to the following reproduce steps and bug logs. To solve this issue and refer to the doc of page pool, the page_pool_put_full_page should be used to replace page_pool_release_page. Because this API will try to recycle the page if the page refcnt equal to 1. After testing 20000 times, the issue can not be reproduced anymore (about testing 391 times the issue will occur on i.MX8MN-EVK before). Reproduce steps: Create the test script and run the script. The script content is as follows: LOOPS=20000 i=1 while [ $i -le $LOOPS ] do echo "TINFO:ENET $curface up and down test $i times" org_macaddr=$(cat /sys/class/net/eth0/address) ifconfig eth0 down ifconfig eth0 hw ether $org_macaddr up i=$(expr $i + 1) done sleep 5 if cat /sys/class/net/eth0/operstate | grep 'up';then echo "TEST PASS" else echo "TEST FAIL" fi Bug detail logs: TINFO:ENET up and down test 391 times [ 850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL) [ 853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec [ 931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec TINFO:ENET up and down test 392 times [ 991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec [ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec [ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL) [ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec [ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec [ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec [ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec [ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec TINFO:ENET up and down test 393 times [ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec [ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec [ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec [ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec [ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec [ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL) [ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec [ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec [ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec [ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec [ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec [ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec TINFO:ENET up and down test 394 times [ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec [ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec [ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec [ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 [ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G C 6.1.1+g56321e101aca #1 [ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT) [ 1537.115561] Call trace: [ 1537.118005] dump_backtrace.part.0+0xe0/0xf0 [ 1537.122289] show_stack+0x18/0x40 [ 1537.125608] dump_stack_lvl+0x64/0x80 [ 1537.129276] dump_stack+0x18/0x34 [ 1537.132592] dump_header+0x44/0x208 [ 1537.136083] oom_kill_process+0x2b4/0x2c0 [ 1537.140097] out_of_memory+0xe4/0x594 [ 1537.143766] __alloc_pages+0xb68/0xd00 [ 1537.147521] alloc_pages+0xac/0x160 [ 1537.151013] __get_free_pages+0x14/0x40 [ 1537.154851] pgd_alloc+0x1c/0x30 [ 1537.158082] mm_init+0xf8/0x1d0 [ 1537.161228] mm_alloc+0x48/0x60 [ 1537.164368] alloc_bprm+0x7c/0x240 [ 1537.167777] do_execveat_common.isra.0+0x70/0x240 [ 1537.172486] __arm64_sys_execve+0x40/0x54 [ 1537.176502] invoke_syscall+0x48/0x114 [ 1537.180255] el0_svc_common.constprop.0+0xcc/0xec [ 1537.184964] do_el0_svc+0x2c/0xd0 [ 1537.188280] el0_svc+0x2c/0x84 [ 1537.191340] el0t_64_sync_handler+0xf4/0x120 [ 1537.195613] el0t_64_sync+0x18c/0x190 [ 1537.199334] Mem-Info: [ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0 [ 1537.201620] active_file:54 inactive_file:112 isolated_file:0 [ 1537.201620] unevictable:0 dirty:0 writeback:0 [ 1537.201620] slab_reclaimable:2620 slab_unreclaimable:7076 [ 1537.201620] mapped:1489 shmem:2473 pagetables:466 [ 1537.201620] sec_pagetables:0 bounce:0 [ 1537.201620] kernel_misc_reclaimable:0 [ 1537.201620] free:136672 free_pcp:96 free_cma:129241 [ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s [ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB [ 1537.300219] lowmem_reserve[]: 0 0 0 0 [ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB [ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB [ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB [ 1537.358087] 2939 total pagecache pages [ 1537.361876] 0 pages in swap cache [ 1537.365229] Free swap = 0kB [ 1537.368147] Total swap = 0kB [ 1537.371065] 516096 pages RAM [ 1537.373959] 0 pages HighMem/MovableOnly [ 1537.377834] 17302 pages reserved [ 1537.381103] 163840 pages cma reserved [ 1537.384809] 0 pages hwpoisoned [ 1537.387902] Tasks state (memory values in pages): [ 1537.392652] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 1537.401356] [ 201] 993 201 1130 72 45056 0 0 rpcbind [ 1537.409772] [ 202] 0 202 4529 1640 77824 0 -250 systemd-journal [ 1537.418861] [ 222] 0 222 4691 801 69632 0 -1000 systemd-udevd [ 1537.427787] [ 248] 994 248 20914 130 65536 0 0 systemd-timesyn [ 1537.436884] [ 497] 0 497 620 31 49152 0 0 atd [ 1537.444938] [ 500] 0 500 854 77 53248 0 0 crond [ 1537.453165] [ 503] 997 503 1470 160 49152 0 -900 dbus-daemon [ 1537.461908] [ 505] 0 505 633 24 40960 0 0 firmwared [ 1537.470491] [ 513] 0 513 2507 180 61440 0 0 ofonod [ 1537.478800] [ 514] 990 514 69640 137 81920 0 0 parsec [ 1537.487120] [ 533] 0 533 599 39 40960 0 0 syslogd [ 1537.495518] [ 534] 0 534 4546 148 65536 0 0 systemd-logind [ 1537.504560] [ 535] 0 535 690 24 45056 0 0 tee-supplicant [ 1537.513564] [ 540] 996 540 2769 168 61440 0 0 systemd-network [ 1537.522680] [ 566] 0 566 3878 228 77824 0 0 connmand [ 1537.531168] [ 645] 998 645 1538 133 57344 0 0 avahi-daemon [ 1537.540004] [ 646] 998 646 1461 64 57344 0 0 avahi-daemon [ 1537.548846] [ 648] 992 648 781 41 45056 0 0 rpc.statd [ 1537.557415] [ 650] 64371 650 590 23 45056 0 0 ninfod [ 1537.565754] [ 653] 61563 653 555 24 45056 0 0 rdisc [ 1537.573971] [ 655] 0 655 374569 2999 290816 0 -999 containerd [ 1537.582621] [ 658] 0 658 1311 20 49152 0 0 agetty [ 1537.590922] [ 663] 0 663 1529 97 49152 0 0 login [ 1537.599138] [ 666] 0 666 3430 202 69632 0 0 wpa_supplicant [ 1537.608147] [ 667] 0 667 2344 96 61440 0 0 systemd-userdbd [ 1537.617240] [ 677] 0 677 2964 314 65536 0 100 systemd [ 1537.625651] [ 679] 0 679 3720 646 73728 0 100 (sd-pam) [ 1537.634138] [ 687] 0 687 1289 403 45056 0 0 sh [ 1537.642108] [ 789] 0 789 970 93 45056 0 0 eth_test2.sh [ 1537.650955] [ 2355] 0 2355 2346 94 61440 0 0 systemd-userwor [ 1537.660046] [ 2356] 0 2356 2346 94 61440 0 0 systemd-userwor [ 1537.669137] [ 2358] 0 2358 2346 95 57344 0 0 systemd-userwor [ 1537.678258] [ 2379] 0 2379 970 93 45056 0 0 eth_test2.sh [ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0 [ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0 [ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec Fixes: 95698ff ("net: fec: using page pool to manage RX buffers") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: shenwei wang <Shenwei.wang@nxp.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca02549 ] Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following crash: [ 32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170 [ 32.093115] Oops [#1] [ 32.093251] Modules linked in: [ 32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16 [ 32.093985] Hardware name: riscv-virtio,qemu (DT) [ 32.094280] epc : ksys_read+0x88/0xd6 [ 32.094855] ra : ksys_read+0xc0/0xd6 [ 32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0 [ 32.095227] gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80 [ 32.095500] t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60 [ 32.095716] s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708 [ 32.095921] a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300 [ 32.096171] a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff [ 32.096411] s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170 [ 32.096638] s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030 [ 32.096865] s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410 [ 32.097092] s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c [ 32.097317] t5 : ffffffff8000c29c t6 : ffffffff800dbc54 [ 32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d [ 32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6 [ 32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38 [ 32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2 Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)', hence it requires obtaining the target address from rs1 followed by updating rd. Fixes: c22b0bc ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com [Palmer: Pick Guo's cleanup] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit fe0ba8c upstream. Commit f1e5250 ("x86/boot: Skip realmode init code when running as Xen PV guest") missed one code path accessing real_mode_header, leading to dereferencing NULL when suspending the system under Xen: [ 348.284004] PM: suspend entry (deep) [ 348.289532] Filesystems sync: 0.005 seconds [ 348.291545] Freezing user space processes ... (elapsed 0.000 seconds) done. [ 348.292457] OOM killer disabled. [ 348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 seconds) done. [ 348.396612] printk: Suspending console(s) (use no_console_suspend to debug) [ 348.749228] PM: suspend devices took 0.352 seconds [ 348.769713] ACPI: EC: interrupt blocked [ 348.816077] BUG: kernel NULL pointer dereference, address: 000000000000001c [ 348.816080] #PF: supervisor read access in kernel mode [ 348.816081] #PF: error_code(0x0000) - not-present page [ 348.816083] PGD 0 P4D 0 [ 348.816086] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 6.1.3-1.fc32.qubes.x86_64 #1 [ 348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 07/03/2022 [ 348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20 Fix that by adding an optional acpi callback allowing to skip setting the wakeup address, as in the Xen PV case this will be handled by the hypervisor anyway. Fixes: f1e5250 ("x86/boot: Skip realmode init code when running as Xen PV guest") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/all/20230117155724.22940-1-jgross%40suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…lume [ Upstream commit 8226e37 ] The freeing of relinquished volume will wake up the pending volume acquisition by using wake_up_bit(), however it is mismatched with wait_var_event() used in fscache_wait_on_volume_collision() and it will never wake up the waiter in the wait-queue because these two functions operate on different wait-queues. According to the implementation in fscache_wait_on_volume_collision(), if the wake-up of pending acquisition is delayed longer than 20 seconds (e.g., due to the delay of on-demand fd closing), the first wait_var_event_timeout() will timeout and the following wait_var_event() will hang forever as shown below: FS-Cache: Potential volume collision new=00000024 old=00000022 ...... INFO: task mount:1148 blocked for more than 122 seconds. Not tainted 6.1.0-rc6+ #1 task:mount state:D stack:0 pid:1148 ppid:1 Call Trace: <TASK> __schedule+0x2f6/0xb80 schedule+0x67/0xe0 fscache_wait_on_volume_collision.cold+0x80/0x82 __fscache_acquire_volume+0x40d/0x4e0 erofs_fscache_register_volume+0x51/0xe0 [erofs] erofs_fscache_register_fs+0x19c/0x240 [erofs] erofs_fc_fill_super+0x746/0xaf0 [erofs] vfs_get_super+0x7d/0x100 get_tree_nodev+0x16/0x20 erofs_fc_get_tree+0x20/0x30 [erofs] vfs_get_tree+0x24/0xb0 path_mount+0x2fa/0xa90 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Considering that wake_up_bit() is more selective, so fix it by using wait_on_bit() instead of wait_var_event() to wait for the freeing of relinquished volume. In addition because waitqueue_active() is used in wake_up_bit() and clear_bit() doesn't imply any memory barrier, use clear_and_wake_up_bit() to add the missing memory barrier between cursor->flags and waitqueue_active(). Fixes: 62ab633 ("fscache: Implement volume registration") Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20230113115211.2895845-2-houtao@huaweicloud.com/ # v3 Signed-off-by: Sasha Levin <sashal@kernel.org>
… UAF [ Upstream commit 226fae1 ] After a call to console_unlock() in vcs_read() the vc_data struct can be freed by vc_deallocate(). Because of that, the struct vc_data pointer load must be done at the top of while loop in vcs_read() to avoid a UAF when vcs_size() is called. Syzkaller reported a UAF in vcs_size(). BUG: KASAN: use-after-free in vcs_size (drivers/tty/vt/vc_screen.c:215) Read of size 4 at addr ffff8881137479a8 by task 4a005ed81e27e65/1537 CPU: 0 PID: 1537 Comm: 4a005ed81e27e65 Not tainted 6.2.0-rc5 #1 Hardware name: Red Hat KVM, BIOS 1.15.0-2.module Call Trace: <TASK> __asan_report_load4_noabort (mm/kasan/report_generic.c:350) vcs_size (drivers/tty/vt/vc_screen.c:215) vcs_read (drivers/tty/vt/vc_screen.c:415) vfs_read (fs/read_write.c:468 fs/read_write.c:450) ... </TASK> Allocated by task 1191: ... kmalloc_trace (mm/slab_common.c:1069) vc_allocate (./include/linux/slab.h:580 ./include/linux/slab.h:720 drivers/tty/vt/vt.c:1128 drivers/tty/vt/vt.c:1108) con_install (drivers/tty/vt/vt.c:3383) tty_init_dev (drivers/tty/tty_io.c:1301 drivers/tty/tty_io.c:1413 drivers/tty/tty_io.c:1390) tty_open (drivers/tty/tty_io.c:2080 drivers/tty/tty_io.c:2126) chrdev_open (fs/char_dev.c:415) do_dentry_open (fs/open.c:883) vfs_open (fs/open.c:1014) ... Freed by task 1548: ... kfree (mm/slab_common.c:1021) vc_port_destruct (drivers/tty/vt/vt.c:1094) tty_port_destructor (drivers/tty/tty_port.c:296) tty_port_put (drivers/tty/tty_port.c:312) vt_disallocate_all (drivers/tty/vt/vt_ioctl.c:662 (discriminator 2)) vt_ioctl (drivers/tty/vt/vt_ioctl.c:903) tty_ioctl (drivers/tty/tty_io.c:2776) ... The buggy address belongs to the object at ffff888113747800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 424 bytes inside of 1024-byte region [ffff888113747800, ffff888113747c00) The buggy address belongs to the physical page: page:00000000b3fe6c7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113740 head:00000000b3fe6c7c order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0010200 ffff888100042dc0 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888113747880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888113747900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888113747980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888113747a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888113747a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint Fixes: ac751ef ("console: rename acquire/release_console_sem() to console_lock/unlock()") Reported-by: syzkaller <syzkaller@googlegroups.com> Suggested-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Link: https://lore.kernel.org/r/1674577014-12374-1-git-send-email-george.kennedy@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 98d0219 upstream. If a relocatable kernel is loaded at an address that is not 2MB aligned and told not to relocate to zero, the kernel can crash due to mark_rodata_ro() incorrectly changing some read-write data to read-only. Scenarios where the misalignment can occur are when the kernel is loaded by kdump or using the RELOCATABLE_TEST config option. Example crash with the kernel loaded at 5MB: Run /sbin/init as init process BUG: Unable to handle kernel data access on write at 0xc000000000452000 Faulting instruction address: 0xc0000000005b6730 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 torvalds#166 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380 REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000 CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0 ... NIP memset+0x68/0x104 LR zero_user_segments.constprop.0+0xa8/0xf0 Call Trace: ext4_mpage_readpages+0x7f8/0x830 ext4_readahead+0x48/0x60 read_pages+0xb8/0x380 page_cache_ra_unbounded+0x19c/0x250 filemap_fault+0x58c/0xae0 __do_fault+0x60/0x100 __handle_mm_fault+0x1230/0x1a40 handle_mm_fault+0x120/0x300 ___do_page_fault+0x20c/0xa80 do_page_fault+0x30/0xc0 data_access_common_virt+0x210/0x220 This happens because mark_rodata_ro() tries to change permissions on the range _stext..__end_rodata, but _stext sits in the middle of the 2MB page from 4MB to 6MB: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec) The logic that changes the permissions assumes the linear mapping was split correctly at boot, so it marks the entire 2MB page read-only. That leads to the write fault above. To fix it, the boot time mapping logic needs to consider that if the kernel is running at a non-zero address then _stext is a boundary where it must split the mapping. That leads to the mapping being split correctly, allowing the rodata permission change to take happen correctly, with no spillover: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec) radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec) If the kernel is loaded at a 2MB aligned address, the mapping continues to use 2MB pages as before: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages Fixes: c55d7b5 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad53db4 upstream. The recent commit 76d588d ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") fixed warnings (and possible deadlocks) in the IMC PMU driver by converting the locking to use spinlocks. It also converted the init-time nest_init_lock to a spinlock, even though it's not used at runtime in IRQ disabled sections or while holding other spinlocks. This leads to warnings such as: BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1 Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) __might_resched+0x178/0x1a0 __cpuhp_setup_state+0x64/0x1e0 init_imc_pmu+0xe48/0x1250 opal_imc_counters_probe+0x30c/0x6a0 platform_probe+0x78/0x110 really_probe+0x104/0x420 __driver_probe_device+0xb0/0x170 driver_probe_device+0x58/0x180 __driver_attach+0xd8/0x250 bus_for_each_dev+0xb4/0x140 driver_attach+0x34/0x50 bus_add_driver+0x1e8/0x2d0 driver_register+0xb4/0x1c0 __platform_driver_register+0x38/0x50 opal_imc_driver_init+0x2c/0x40 do_one_initcall+0x80/0x360 kernel_init_freeable+0x310/0x3b8 kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64 Fix it by converting nest_init_lock back to a mutex, so that we can call sleeping functions while holding it. There is no interaction between nest_init_lock and the runtime spinlocks used by the actual PMU routines. Fixes: 76d588d ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") Tested-by: Kajol Jain<kjain@linux.ibm.com> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e632291 ] The cited commit creates child PKEY interfaces over netlink will multiple tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx queues. This causes to a crash when traffic is sent over the PKEY interface due to the parent having a single queue but the child having multiple queues. This patch fixes the number of queues to 1 for legacy IPoIB at the earliest possible point in time. BUG: kernel NULL pointer dereference, address: 000000000000036b PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0xcb/0x450 Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_clone+0x55/0xd0 ip6_finish_output2+0x3fe/0x690 ip6_finish_output+0xfa/0x310 ip6_send_skb+0x1e/0x60 udp_v6_send_skb+0x1e5/0x420 udpv6_sendmsg+0xb3c/0xe60 ? ip_mc_finish_output+0x180/0x180 ? __switch_to_asm+0x3a/0x60 ? __switch_to_asm+0x34/0x60 sock_sendmsg+0x33/0x40 __sys_sendto+0x103/0x160 ? _copy_to_user+0x21/0x30 ? kvm_clock_get_cycles+0xd/0x10 ? ktime_get_ts64+0x49/0xe0 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f9374f1ed14 Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc </TASK> Fixes: dbc94a0 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/95eb6b74c7cf49fa46281f9d056d685c9fa11d38.1674584576.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d159f7 ] When both ice and the irdma driver are loaded, a warning in check_flush_dependency is being triggered. This is due to ice driver workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one is not. According to kernel documentation, this flag should be set if the workqueue will be involved in the kernel's memory reclamation flow. Since it is not, there is no need for the ice driver's WQ to have this flag set so remove it. Example trace: [ +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0 [ +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel _rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1 0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_ core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [ +0.000161] [last unloaded: bonding] [ +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1 [ +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ +0.000003] Workqueue: ice ice_service_task [ice] [ +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0 [ +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08 9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06 [ +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282 [ +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80 [ +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112 [ +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000 [ +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400 [ +0.000004] FS: 0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000 [ +0.000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0 [ +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000002] PKRU: 55555554 [ +0.000003] Call Trace: [ +0.000002] <TASK> [ +0.000003] __flush_workqueue+0x203/0x840 [ +0.000006] ? mutex_unlock+0x84/0xd0 [ +0.000008] ? __pfx_mutex_unlock+0x10/0x10 [ +0.000004] ? __pfx___flush_workqueue+0x10/0x10 [ +0.000006] ? mutex_lock+0xa3/0xf0 [ +0.000005] ib_cache_cleanup_one+0x39/0x190 [ib_core] [ +0.000174] __ib_unregister_device+0x84/0xf0 [ib_core] [ +0.000094] ib_unregister_device+0x25/0x30 [ib_core] [ +0.000093] irdma_ib_unregister_device+0x97/0xc0 [irdma] [ +0.000064] ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma] [ +0.000059] ? up_write+0x5c/0x90 [ +0.000005] irdma_remove+0x36/0x90 [irdma] [ +0.000062] auxiliary_bus_remove+0x32/0x50 [ +0.000007] device_release_driver_internal+0xfa/0x1c0 [ +0.000005] bus_remove_device+0x18a/0x260 [ +0.000007] device_del+0x2e5/0x650 [ +0.000005] ? __pfx_device_del+0x10/0x10 [ +0.000003] ? mutex_unlock+0x84/0xd0 [ +0.000004] ? __pfx_mutex_unlock+0x10/0x10 [ +0.000004] ? _raw_spin_unlock+0x18/0x40 [ +0.000005] ice_unplug_aux_dev+0x52/0x70 [ice] [ +0.000160] ice_service_task+0x1309/0x14f0 [ice] [ +0.000134] ? __pfx___schedule+0x10/0x10 [ +0.000006] process_one_work+0x3b1/0x6c0 [ +0.000008] worker_thread+0x69/0x670 [ +0.000005] ? __kthread_parkme+0xec/0x110 [ +0.000007] ? __pfx_worker_thread+0x10/0x10 [ +0.000005] kthread+0x17f/0x1b0 [ +0.000005] ? __pfx_kthread+0x10/0x10 [ +0.000004] ret_from_fork+0x29/0x50 [ +0.000009] </TASK> Fixes: 940b61a ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 462a8e0 upstream. When we upgraded our kernel, we started seeing some page corruption like the following consistently: BUG: Bad page state in process ganesha.nfsd pfn:1304ca page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca flags: 0x17ffffc0000000() raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000 raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Call Trace: dump_stack+0x74/0x96 bad_page.cold+0x63/0x94 check_new_page_bad+0x6d/0x80 rmqueue+0x46e/0x970 get_page_from_freelist+0xcb/0x3f0 ? _cond_resched+0x19/0x40 __alloc_pages_nodemask+0x164/0x300 alloc_pages_current+0x87/0xf0 skb_page_frag_refill+0x84/0x110 ... Sometimes, it would also show up as corruption in the free list pointer and cause crashes. After bisecting the issue, we found the issue started from commit e320d30 ("mm/page_alloc.c: fix freeing non-compound pages"): if (put_page_testzero(page)) free_the_page(page, order); else if (!PageHead(page)) while (order-- > 0) free_the_page(page + (1 << order), order); So the problem is the check PageHead is racy because at this point we already dropped our reference to the page. So even if we came in with compound page, the page can already be freed and PageHead can return false and we will end up freeing all the tail pages causing double free. Fixes: e320d30 ("mm/page_alloc.c: fix freeing non-compound pages") Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ad7bbf upstream. Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli torvalds#338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/ Fixes: 067f44c ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <christian.koenig@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55d77ba upstream. On powerpc64, you can build a kernel with KASAN as soon as you build it with RADIX MMU support. However if the CPU doesn't have RADIX MMU, KASAN isn't enabled at init and the following Oops is encountered. [ 0.000000][ T0] KASAN not enabled as it requires radix! [ 4.484295][ T26] BUG: Unable to handle kernel data access at 0xc00e000000804a04 [ 4.485270][ T26] Faulting instruction address: 0xc00000000062ec6c [ 4.485748][ T26] Oops: Kernel access of bad area, sig: 11 [#1] [ 4.485920][ T26] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4.486259][ T26] Modules linked in: [ 4.486637][ T26] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 6.2.0-rc3-02590-gf8a023b0a805 torvalds#249 [ 4.486907][ T26] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,HEAD pSeries [ 4.487445][ T26] Workqueue: eval_map_wq .tracer_init_tracefs_work_func [ 4.488744][ T26] NIP: c00000000062ec6c LR: c00000000062bb84 CTR: c0000000002ebcd0 [ 4.488867][ T26] REGS: c0000000049175c0 TRAP: 0380 Not tainted (6.2.0-rc3-02590-gf8a023b0a805) [ 4.489028][ T26] MSR: 8000000002009032 <SF,VEC,EE,ME,IR,DR,RI> CR: 44002808 XER: 00000000 [ 4.489584][ T26] CFAR: c00000000062bb80 IRQMASK: 0 [ 4.489584][ T26] GPR00: c0000000005624d4 c000000004917860 c000000001cfc000 1800000000804a04 [ 4.489584][ T26] GPR04: c0000000003a2650 0000000000000cc0 c00000000000d3d8 c00000000000d3d8 [ 4.489584][ T26] GPR08: c0000000049175b0 a80e000000000000 0000000000000000 0000000017d78400 [ 4.489584][ T26] GPR12: 0000000044002204 c000000003790000 c00000000435003c c0000000043f1c40 [ 4.489584][ T26] GPR16: c0000000043f1c68 c0000000043501a0 c000000002106138 c0000000043f1c08 [ 4.489584][ T26] GPR20: c0000000043f1c10 c0000000043f1c20 c000000004146c40 c000000002fdb7f8 [ 4.489584][ T26] GPR24: c000000002fdb834 c000000003685e00 c000000004025030 c000000003522e90 [ 4.489584][ T26] GPR28: 0000000000000cc0 c0000000003a2650 c000000004025020 c000000004025020 [ 4.491201][ T26] NIP [c00000000062ec6c] .kasan_byte_accessible+0xc/0x20 [ 4.491430][ T26] LR [c00000000062bb84] .__kasan_check_byte+0x24/0x90 [ 4.491767][ T26] Call Trace: [ 4.491941][ T26] [c000000004917860] [c00000000062ae70] .__kasan_kmalloc+0xc0/0x110 (unreliable) [ 4.492270][ T26] [c0000000049178f0] [c0000000005624d4] .krealloc+0x54/0x1c0 [ 4.492453][ T26] [c000000004917990] [c0000000003a2650] .create_trace_option_files+0x280/0x530 [ 4.492613][ T26] [c000000004917a90] [c000000002050d90] .tracer_init_tracefs_work_func+0x274/0x2c0 [ 4.492771][ T26] [c000000004917b40] [c0000000001f9948] .process_one_work+0x578/0x9f0 [ 4.492927][ T26] [c000000004917c30] [c0000000001f9ebc] .worker_thread+0xfc/0x950 [ 4.493084][ T26] [c000000004917d60] [c00000000020be84] .kthread+0x1a4/0x1b0 [ 4.493232][ T26] [c000000004917e10] [c00000000000d3d8] .ret_from_kernel_thread+0x58/0x60 [ 4.495642][ T26] Code: 60000000 7cc802a6 38a00000 4bfffc78 60000000 7cc802a6 38a00001 4bfffc68 60000000 3d20a80e 7863e8c2 792907c6 <7c6348ae> 20630007 78630fe0 68630001 [ 4.496704][ T26] ---[ end trace 0000000000000000 ]--- The Oops is due to kasan_byte_accessible() not checking the readiness of KASAN. Add missing call to kasan_arch_is_ready() and bail out when not ready. The same problem is observed with ____kasan_kfree_large() so fix it the same. Also, as KASAN is not available and no shadow area is allocated for linear memory mapping, there is no point in allocating shadow mem for vmalloc memory as shown below in /sys/kernel/debug/kernel_page_tables ---[ kasan shadow mem start ]--- 0xc00f000000000000-0xc00f00000006ffff 0x00000000040f0000 448K r w pte valid present dirty accessed 0xc00f000000860000-0xc00f00000086ffff 0x000000000ac10000 64K r w pte valid present dirty accessed 0xc00f3ffffffe0000-0xc00f3fffffffffff 0x0000000004d10000 128K r w pte valid present dirty accessed ---[ kasan shadow mem end ]--- So, also verify KASAN readiness before allocating and poisoning shadow mem for VMAs. Link: https://lkml.kernel.org/r/150768c55722311699fdcf8f5379e8256749f47d.1674716617.git.christophe.leroy@csgroup.eu Fixes: 41b7a34 ("powerpc: Book3S 64-bit outline-only KASAN support") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by: Nathan Lynch <nathanl@linux.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> [5.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2558b80 upstream. syzbot found arm64 builds would crash in sock_recv_mark() when CONFIG_HARDENED_USERCOPY=y x86 and powerpc are not detecting the issue because they define user_access_begin. This will be handled in a different patch, because a check_object_size() is missing. Only data from skb->cb[] can be copied directly to/from user space, as explained in commit 79a8a64 ("net: Whitelist the skbuff_head_cache "cb" field") syzbot report was: usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_head_cache' (offset 168, size 4)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 4410 Comm: syz-executor533 Not tainted 6.2.0-rc7-syzkaller-17907-g2d3827b3f393 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : usercopy_abort+0x90/0x94 mm/usercopy.c:90 lr : usercopy_abort+0x90/0x94 mm/usercopy.c:90 sp : ffff80000fb9b9a0 x29: ffff80000fb9b9b0 x28: ffff0000c6073400 x27: 0000000020001a00 x26: 0000000000000014 x25: ffff80000cf52000 x24: fffffc0000000000 x23: 05ffc00000000200 x22: fffffc000324bf80 x21: ffff0000c92fe1a8 x20: 0000000000000001 x19: 0000000000000004 x18: 0000000000000000 x17: 656a626f2042554c x16: ffff0000c6073dd0 x15: ffff80000dbd2118 x14: ffff0000c6073400 x13: 00000000ffffffff x12: ffff0000c6073400 x11: ff808000081bbb4c x10: 0000000000000000 x9 : 7b0572d7cc0ccf00 x8 : 7b0572d7cc0ccf00 x7 : ffff80000bf650d4 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000000 x2 : ffff0001fefbff08 x1 : 0000000100000000 x0 : 000000000000006c Call trace: usercopy_abort+0x90/0x94 mm/usercopy.c:90 __check_heap_object+0xa8/0x100 mm/slub.c:4761 check_heap_object mm/usercopy.c:196 [inline] __check_object_size+0x208/0x6b8 mm/usercopy.c:251 check_object_size include/linux/thread_info.h:199 [inline] __copy_to_user include/linux/uaccess.h:115 [inline] put_cmsg+0x408/0x464 net/core/scm.c:238 sock_recv_mark net/socket.c:975 [inline] __sock_recv_cmsgs+0x1fc/0x248 net/socket.c:984 sock_recv_cmsgs include/net/sock.h:2728 [inline] packet_recvmsg+0x2d8/0x678 net/packet/af_packet.c:3482 ____sys_recvmsg+0x110/0x3a0 ___sys_recvmsg net/socket.c:2737 [inline] __sys_recvmsg+0x194/0x210 net/socket.c:2767 __do_sys_recvmsg net/socket.c:2777 [inline] __se_sys_recvmsg net/socket.c:2774 [inline] __arm64_sys_recvmsg+0x2c/0x3c net/socket.c:2774 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x64/0x178 arch/arm64/kernel/syscall.c:52 el0_svc_common+0xbc/0x180 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x48/0x110 arch/arm64/kernel/syscall.c:193 el0_svc+0x58/0x14c arch/arm64/kernel/entry-common.c:637 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Code: 91388800 aa0903e1 f90003e8 94e6d752 (d4210000) Fixes: 6fd1d51 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Erin MacNeil <lnx.erin@gmail.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20230213160059.3829741-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 42018a3 ] Syzkaller found an issue where a handle greater than 16 bits would trigger a null-ptr-deref in the imperfect hash area update. general protection fault, probably for non-canonical address 0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af] CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted 6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509 Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00 RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8 RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8 R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00 FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572 tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155 rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x334/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmmsg+0x18f/0x460 net/socket.c:2616 __do_sys_sendmmsg net/socket.c:2645 [inline] __se_sys_sendmmsg net/socket.c:2642 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 Fixes: ee05917 ("net/sched: tcindex: update imperfect hash filters respecting rcu") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a lock inversion and rwsem read-lock recursion in the devfreq
target callback which can lead to deadlocks.
Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock
read lock when toggling the write booster, which involves taking the
dev_cmd mutex before taking another clk_scaling_lock read lock.
This can lead to a deadlock if another thread:
1) tries to acquire the dev_cmd and clk_scaling locks in the correct
order, or
2) takes a clk_scaling write lock before the attempt to take the
clk_scaling read lock a second time.
Fix this by dropping the clk_scaling_lock before toggling the write booster
as was done before commit 0e9d4ca ("scsi: ufs: Protect some contexts
from unexpected clock scaling").
While the devfreq callbacks are already serialised, add a second
serialising mutex to handle the unlikely case where a callback triggered
through the devfreq sysfs interface is racing with a request to disable
clock scaling through the UFS controller 'clkscale_enable' sysfs
attribute. This could otherwise lead to the write booster being left
disabled after having disabled clock scaling.
Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any
pending write booster update has completed on return.
Note that this currently only affects Qualcomm platforms since commit
87bd050 ("scsi: ufs: core: Allow host driver to disable wb toggling
during clock scaling").
The lock inversion (i.e. 1 above) was reported by lockdep as:
======================================================
WARNING: possible circular locking dependency detected
6.1.0-next-20221216 torvalds#211 Not tainted
------------------------------------------------------
kworker/u16:2/71 is trying to acquire lock:
ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0
but task is already holding lock:
ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380
which lock already depends on the new lock.
[ +0.011606]
the existing dependency chain (in reverse order) is:
-> #1 (&hba->clk_scaling_lock){++++}-{3:3}:
lock_acquire+0x68/0x90
down_read+0x58/0x80
ufshcd_exec_dev_cmd+0x70/0x2c0
ufshcd_verify_dev_init+0x68/0x170
ufshcd_probe_hba+0x398/0x1180
ufshcd_async_scan+0x30/0x320
async_run_entry_fn+0x34/0x150
process_one_work+0x288/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x120
ret_from_fork+0x10/0x20
-> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}:
__lock_acquire+0x12a0/0x2240
lock_acquire.part.0+0xcc/0x220
lock_acquire+0x68/0x90
__mutex_lock+0x98/0x430
mutex_lock_nested+0x2c/0x40
ufshcd_query_flag+0x50/0x1c0
ufshcd_query_flag_retry+0x64/0x100
ufshcd_wb_toggle+0x5c/0x120
ufshcd_devfreq_scale+0x2c4/0x380
ufshcd_devfreq_target+0xf4/0x230
devfreq_set_target+0x84/0x2f0
devfreq_update_target+0xc4/0xf0
devfreq_monitor+0x38/0x1f0
process_one_work+0x288/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x120
ret_from_fork+0x10/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&hba->clk_scaling_lock);
lock(&hba->dev_cmd.lock);
lock(&hba->clk_scaling_lock);
lock(&hba->dev_cmd.lock);
*** DEADLOCK ***
Fixes: 0e9d4ca ("scsi: ufs: Protect some contexts from unexpected clock scaling")
Cc: stable@vger.kernel.org # 5.12
Cc: Can Guo <quic_cang@quicinc.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230116161201.16923-1-johan+linaro@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit f1e5250 ("x86/boot: Skip realmode init code when running as Xen PV guest") missed one code path accessing real_mode_header, leading to dereferencing NULL when suspending the system under Xen: [ 348.284004] PM: suspend entry (deep) [ 348.289532] Filesystems sync: 0.005 seconds [ 348.291545] Freezing user space processes ... (elapsed 0.000 seconds) done. [ 348.292457] OOM killer disabled. [ 348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 seconds) done. [ 348.396612] printk: Suspending console(s) (use no_console_suspend to debug) [ 348.749228] PM: suspend devices took 0.352 seconds [ 348.769713] ACPI: EC: interrupt blocked [ 348.816077] BUG: kernel NULL pointer dereference, address: 000000000000001c [ 348.816080] #PF: supervisor read access in kernel mode [ 348.816081] #PF: error_code(0x0000) - not-present page [ 348.816083] PGD 0 P4D 0 [ 348.816086] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 6.1.3-1.fc32.qubes.x86_64 #1 [ 348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 07/03/2022 [ 348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20 Fix that by adding an optional acpi callback allowing to skip setting the wakeup address, as in the Xen PV case this will be handled by the hypervisor anyway. Fixes: f1e5250 ("x86/boot: Skip realmode init code when running as Xen PV guest") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/all/20230117155724.22940-1-jgross%40suse.com
The page_pool_release_page was used when freeing rx buffers, and this
function just unmaps the page (if mapped) and does not recycle the page.
So after hundreds of down/up the eth0, the system will out of memory.
For more details, please refer to the following reproduce steps and
bug logs. To solve this issue and refer to the doc of page pool, the
page_pool_put_full_page should be used to replace page_pool_release_page.
Because this API will try to recycle the page if the page refcnt equal to
1. After testing 20000 times, the issue can not be reproduced anymore
(about testing 391 times the issue will occur on i.MX8MN-EVK before).
Reproduce steps:
Create the test script and run the script. The script content is as
follows:
LOOPS=20000
i=1
while [ $i -le $LOOPS ]
do
echo "TINFO:ENET $curface up and down test $i times"
org_macaddr=$(cat /sys/class/net/eth0/address)
ifconfig eth0 down
ifconfig eth0 hw ether $org_macaddr up
i=$(expr $i + 1)
done
sleep 5
if cat /sys/class/net/eth0/operstate | grep 'up';then
echo "TEST PASS"
else
echo "TEST FAIL"
fi
Bug detail logs:
TINFO:ENET up and down test 391 times
[ 850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
[ 931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
TINFO:ENET up and down test 392 times
[ 991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
[ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
[ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
[ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
[ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
[ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
[ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
TINFO:ENET up and down test 393 times
[ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
[ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
[ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
[ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
[ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
[ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
[ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
[ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
[ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
[ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
[ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
TINFO:ENET up and down test 394 times
[ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
[ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
[ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
[ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
[ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G C 6.1.1+g56321e101aca #1
[ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
[ 1537.115561] Call trace:
[ 1537.118005] dump_backtrace.part.0+0xe0/0xf0
[ 1537.122289] show_stack+0x18/0x40
[ 1537.125608] dump_stack_lvl+0x64/0x80
[ 1537.129276] dump_stack+0x18/0x34
[ 1537.132592] dump_header+0x44/0x208
[ 1537.136083] oom_kill_process+0x2b4/0x2c0
[ 1537.140097] out_of_memory+0xe4/0x594
[ 1537.143766] __alloc_pages+0xb68/0xd00
[ 1537.147521] alloc_pages+0xac/0x160
[ 1537.151013] __get_free_pages+0x14/0x40
[ 1537.154851] pgd_alloc+0x1c/0x30
[ 1537.158082] mm_init+0xf8/0x1d0
[ 1537.161228] mm_alloc+0x48/0x60
[ 1537.164368] alloc_bprm+0x7c/0x240
[ 1537.167777] do_execveat_common.isra.0+0x70/0x240
[ 1537.172486] __arm64_sys_execve+0x40/0x54
[ 1537.176502] invoke_syscall+0x48/0x114
[ 1537.180255] el0_svc_common.constprop.0+0xcc/0xec
[ 1537.184964] do_el0_svc+0x2c/0xd0
[ 1537.188280] el0_svc+0x2c/0x84
[ 1537.191340] el0t_64_sync_handler+0xf4/0x120
[ 1537.195613] el0t_64_sync+0x18c/0x190
[ 1537.199334] Mem-Info:
[ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
[ 1537.201620] active_file:54 inactive_file:112 isolated_file:0
[ 1537.201620] unevictable:0 dirty:0 writeback:0
[ 1537.201620] slab_reclaimable:2620 slab_unreclaimable:7076
[ 1537.201620] mapped:1489 shmem:2473 pagetables:466
[ 1537.201620] sec_pagetables:0 bounce:0
[ 1537.201620] kernel_misc_reclaimable:0
[ 1537.201620] free:136672 free_pcp:96 free_cma:129241
[ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
[ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
[ 1537.300219] lowmem_reserve[]: 0 0 0 0
[ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
[ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1537.358087] 2939 total pagecache pages
[ 1537.361876] 0 pages in swap cache
[ 1537.365229] Free swap = 0kB
[ 1537.368147] Total swap = 0kB
[ 1537.371065] 516096 pages RAM
[ 1537.373959] 0 pages HighMem/MovableOnly
[ 1537.377834] 17302 pages reserved
[ 1537.381103] 163840 pages cma reserved
[ 1537.384809] 0 pages hwpoisoned
[ 1537.387902] Tasks state (memory values in pages):
[ 1537.392652] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 1537.401356] [ 201] 993 201 1130 72 45056 0 0 rpcbind
[ 1537.409772] [ 202] 0 202 4529 1640 77824 0 -250 systemd-journal
[ 1537.418861] [ 222] 0 222 4691 801 69632 0 -1000 systemd-udevd
[ 1537.427787] [ 248] 994 248 20914 130 65536 0 0 systemd-timesyn
[ 1537.436884] [ 497] 0 497 620 31 49152 0 0 atd
[ 1537.444938] [ 500] 0 500 854 77 53248 0 0 crond
[ 1537.453165] [ 503] 997 503 1470 160 49152 0 -900 dbus-daemon
[ 1537.461908] [ 505] 0 505 633 24 40960 0 0 firmwared
[ 1537.470491] [ 513] 0 513 2507 180 61440 0 0 ofonod
[ 1537.478800] [ 514] 990 514 69640 137 81920 0 0 parsec
[ 1537.487120] [ 533] 0 533 599 39 40960 0 0 syslogd
[ 1537.495518] [ 534] 0 534 4546 148 65536 0 0 systemd-logind
[ 1537.504560] [ 535] 0 535 690 24 45056 0 0 tee-supplicant
[ 1537.513564] [ 540] 996 540 2769 168 61440 0 0 systemd-network
[ 1537.522680] [ 566] 0 566 3878 228 77824 0 0 connmand
[ 1537.531168] [ 645] 998 645 1538 133 57344 0 0 avahi-daemon
[ 1537.540004] [ 646] 998 646 1461 64 57344 0 0 avahi-daemon
[ 1537.548846] [ 648] 992 648 781 41 45056 0 0 rpc.statd
[ 1537.557415] [ 650] 64371 650 590 23 45056 0 0 ninfod
[ 1537.565754] [ 653] 61563 653 555 24 45056 0 0 rdisc
[ 1537.573971] [ 655] 0 655 374569 2999 290816 0 -999 containerd
[ 1537.582621] [ 658] 0 658 1311 20 49152 0 0 agetty
[ 1537.590922] [ 663] 0 663 1529 97 49152 0 0 login
[ 1537.599138] [ 666] 0 666 3430 202 69632 0 0 wpa_supplicant
[ 1537.608147] [ 667] 0 667 2344 96 61440 0 0 systemd-userdbd
[ 1537.617240] [ 677] 0 677 2964 314 65536 0 100 systemd
[ 1537.625651] [ 679] 0 679 3720 646 73728 0 100 (sd-pam)
[ 1537.634138] [ 687] 0 687 1289 403 45056 0 0 sh
[ 1537.642108] [ 789] 0 789 970 93 45056 0 0 eth_test2.sh
[ 1537.650955] [ 2355] 0 2355 2346 94 61440 0 0 systemd-userwor
[ 1537.660046] [ 2356] 0 2356 2346 94 61440 0 0 systemd-userwor
[ 1537.669137] [ 2358] 0 2358 2346 95 57344 0 0 systemd-userwor
[ 1537.678258] [ 2379] 0 2379 970 93 45056 0 0 eth_test2.sh
[ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0
[ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
[ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec
Fixes: 95698ff ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: shenwei wang <Shenwei.wang@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Sitnicki says: ==================== This patch set addresses the syzbot report in [1]. Patch #1 has been suggested by Eric [2]. I extended it to cover the rest of sock_map proto callbacks. Otherwise we would still overflow the stack. Patch #2 contains the actual fix and bug analysis. Patches #3 & #4 add coverage to selftests to trigger the bug. [1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/ [2] https://lore.kernel.org/all/CANn89iK2UN1FmdUcH12fv_xiZkv2G+Nskvmq7fG6aA_6VKRf6g@mail.gmail.com/ --- v1 -> v2: v1: https://lore.kernel.org/r/20230113-sockmap-fix-v1-0-d3cad092ee10@cloudflare.com [v1 didn't hit bpf@ ML by mistake] * pull in Eric's patch to protect against recursion loop bugs (Eric) * add a macro helper to check if pointer is inside a memory range (Eric) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following crash: [ 32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170 [ 32.093115] Oops [#1] [ 32.093251] Modules linked in: [ 32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16 [ 32.093985] Hardware name: riscv-virtio,qemu (DT) [ 32.094280] epc : ksys_read+0x88/0xd6 [ 32.094855] ra : ksys_read+0xc0/0xd6 [ 32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0 [ 32.095227] gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80 [ 32.095500] t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60 [ 32.095716] s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708 [ 32.095921] a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300 [ 32.096171] a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff [ 32.096411] s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170 [ 32.096638] s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030 [ 32.096865] s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410 [ 32.097092] s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c [ 32.097317] t5 : ffffffff8000c29c t6 : ffffffff800dbc54 [ 32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d [ 32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6 [ 32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38 [ 32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2 Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)', hence it requires obtaining the target address from rs1 followed by updating rd. Fixes: c22b0bc ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com [Palmer: Pick Guo's cleanup] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Replace the hack added by commit f958bd2 ("KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platform") with a more robust approach of unloading+reloading guest FPU state based on whether or not the vCPU's FPU is currently in-use, i.e. currently loaded. This fixes a bug on hosts that support CET but not MPX, where kvm_arch_vcpu_ioctl_get_mpstate() neglects to load FPU state (it only checks for MPX support) and leads to KVM attempting to put FPU state due to kvm_apic_accept_events() triggering INIT emulation. E.g. on a host with CET but not MPX, syzkaller+KASAN generates: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 211 UID: 0 PID: 20451 Comm: syz.9.26 Tainted: G S 6.18.0-smp-DEV #7 NONE Tainted: [S]=CPU_OUT_OF_SPEC Hardware name: Google Izumi/izumi, BIOS 0.20250729.1-0 07/29/2025 RIP: 0010:fpu_swap_kvm_fpstate+0x3ce/0x610 ../arch/x86/kernel/fpu/core.c:377 RSP: 0018:ff1100410c167cc0 EFLAGS: 00010202 RAX: 0000000000000004 RBX: 0000000000000020 RCX: 00000000000001aa RDX: 00000000000001ab RSI: ffffffff817bb960 RDI: 0000000022600000 RBP: dffffc0000000000 R08: ff110040d23c8007 R09: 1fe220081a479000 R10: dffffc0000000000 R11: ffe21c081a479001 R12: ff110040d23c8d98 R13: 00000000fffdc578 R14: 0000000000000000 R15: ff110040d23c8d90 FS: 00007f86dd1876c0(0000) GS:ff11007fc969b000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f86dd186fa8 CR3: 00000040d1dfa003 CR4: 0000000000f73ef0 PKRU: 80000000 Call Trace: <TASK> kvm_vcpu_reset+0x80d/0x12c0 ../arch/x86/kvm/x86.c:11818 kvm_apic_accept_events+0x1cb/0x500 ../arch/x86/kvm/lapic.c:3489 kvm_arch_vcpu_ioctl_get_mpstate+0xd0/0x4e0 ../arch/x86/kvm/x86.c:12145 kvm_vcpu_ioctl+0x5e2/0xed0 ../virt/kvm/kvm_main.c:4539 __se_sys_ioctl+0x11d/0x1b0 ../fs/ioctl.c:51 do_syscall_x64 ../arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x6e/0x940 ../arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f86de71d9c9 </TASK> with a very simple reproducer: r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x80b00, 0x0) r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) ioctl$KVM_CREATE_IRQCHIP(r1, 0xae60) r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) ioctl$KVM_SET_IRQCHIP(r1, 0x8208ae63, ...) ioctl$KVM_GET_MP_STATE(r2, 0x8004ae98, &(0x7f00000000c0)) Alternatively, the MPX hack in GET_MP_STATE could be extended to cover CET, but from a "don't break existing functionality" perspective, that isn't any less risky than peeking at the state of in_use, and it's far less robust for a long term solution (as evidenced by this bug). Reported-by: Alexander Potapenko <glider@google.com> Fixes: 69cc3e8 ("KVM: x86: Add XSS support for CET_KERNEL and CET_USER") Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20251030185802.3375059-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Use a raw spinlock for vcpu_svm.ir_list_lock as the lock can be taken during schedule() via kvm_sched_out() => __avic_vcpu_put(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y. This fixes the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.12.0-146.1640_2124176644.el10.x86_64+debug #1 Not tainted ----------------------------- qemu-kvm/38299 is trying to lock: ff11000239725600 (&svm->ir_list_lock){....}-{3:3}, at: __avic_vcpu_put+0xfd/0x300 [kvm_amd] other info that might help us debug this: context-{5:5} 2 locks held by qemu-kvm/38299: #0: ff11000239723ba8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x240/0xe00 [kvm] #1: ff11000b906056d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2e/0x130 stack backtrace: CPU: 1 UID: 0 PID: 38299 Comm: qemu-kvm Kdump: loaded Not tainted 6.12.0-146.1640_2124176644.el10.x86_64+debug #1 PREEMPT(voluntary) Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS RQZ100AB 09/14/2023 Call Trace: <TASK> dump_stack_lvl+0x6f/0xb0 __lock_acquire+0x921/0xb80 lock_acquire.part.0+0xbe/0x270 _raw_spin_lock_irqsave+0x46/0x90 __avic_vcpu_put+0xfd/0x300 [kvm_amd] svm_vcpu_put+0xfa/0x130 [kvm_amd] kvm_arch_vcpu_put+0x48c/0x790 [kvm] kvm_sched_out+0x161/0x1c0 [kvm] prepare_task_switch+0x36b/0xf60 __schedule+0x4f7/0x1890 schedule+0xd4/0x260 xfer_to_guest_mode_handle_work+0x54/0xc0 vcpu_run+0x69a/0xa70 [kvm] kvm_arch_vcpu_ioctl_run+0xdc0/0x17e0 [kvm] kvm_vcpu_ioctl+0x39f/0xe00 [kvm] Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://patch.msgid.link/20251030194130.307900-1-mlevitsk@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
…or slabobj_ext When alloc_slab_obj_exts() fails and then later succeeds in allocating a slab extension vector, it calls handle_failed_objexts_alloc() to mark all objects in the vector as empty. As a result all objects in this slab (slabA) will have their extensions set to CODETAG_EMPTY. Later on if this slabA is used to allocate a slabobj_ext vector for another slab (slabB), we end up with the slabB->obj_exts pointing to a slabobj_ext vector that itself has a non-NULL slabobj_ext equal to CODETAG_EMPTY. When slabB gets freed, free_slab_obj_exts() is called to free slabB->obj_exts vector. free_slab_obj_exts() calls mark_objexts_empty(slabB->obj_exts) which will generate a warning because it expects slabobj_ext vectors to have a NULL obj_ext, not CODETAG_EMPTY. Modify mark_objexts_empty() to skip the warning and setting the obj_ext value if it's already set to CODETAG_EMPTY. To quickly detect this WARN, I modified the code from WARN_ON(slab_exts[offs].ref.ct) to BUG_ON(slab_exts[offs].ref.ct == 1); We then obtained this message: [21630.898561] ------------[ cut here ]------------ [21630.898596] kernel BUG at mm/slub.c:2050! [21630.898611] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [21630.900372] Modules linked in: squashfs isofs vfio_iommu_type1 vhost_vsock vfio vhost_net vmw_vsock_virtio_transport_common vhost tap vhost_iotlb iommufd vsock binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu sr_mod cdrom drm_client_lib virtio_dma_buf drm_shmem_helper drm_kms_helper drm ghash_ce backlight virtio_net virtio_blk virtio_scsi net_failover virtio_console failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] [21630.909177] CPU: 3 UID: 0 PID: 3787 Comm: kylin-process-m Kdump: loaded Tainted: G W 6.18.0-rc1+ torvalds#74 PREEMPT(voluntary) [21630.910495] Tainted: [W]=WARN [21630.910867] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 [21630.911625] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [21630.912392] pc : __free_slab+0x228/0x250 [21630.912868] lr : __free_slab+0x18c/0x250[21630.913334] sp : ffff8000a02f73e0 [21630.913830] x29: ffff8000a02f73e0 x28: fffffdffc43fc800 x27: ffff0000c0011c40 [21630.914677] x26: ffff0000c000cac0 x25: ffff00010fe5e5f0 x24: ffff000102199b40 [21630.915469] x23: 0000000000000003 x22: 0000000000000003 x21: ffff0000c0011c40 [21630.916259] x20: fffffdffc4086600 x19: fffffdffc43fc800 x18: 0000000000000000 [21630.917048] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [21630.917837] x14: 0000000000000000 x13: 0000000000000000 x12: ffff70001405ee66 [21630.918640] x11: 1ffff0001405ee65 x10: ffff70001405ee65 x9 : ffff800080a295dc [21630.919442] x8 : ffff8000a02f7330 x7 : 0000000000000000 x6 : 0000000000003000 [21630.920232] x5 : 0000000024924925 x4 : 0000000000000001 x3 : 0000000000000007 [21630.921021] x2 : 0000000000001b40 x1 : 000000000000001f x0 : 0000000000000001 [21630.921810] Call trace: [21630.922130] __free_slab+0x228/0x250 (P) [21630.922669] free_slab+0x38/0x118 [21630.923079] free_to_partial_list+0x1d4/0x340 [21630.923591] __slab_free+0x24c/0x348 [21630.924024] ___cache_free+0xf0/0x110 [21630.924468] qlist_free_all+0x78/0x130 [21630.924922] kasan_quarantine_reduce+0x114/0x148 [21630.925525] __kasan_slab_alloc+0x7c/0xb0 [21630.926006] kmem_cache_alloc_noprof+0x164/0x5c8 [21630.926699] __alloc_object+0x44/0x1f8 [21630.927153] __create_object+0x34/0xc8 [21630.927604] kmemleak_alloc+0xb8/0xd8 [21630.928052] kmem_cache_alloc_noprof+0x368/0x5c8 [21630.928606] getname_flags.part.0+0xa4/0x610 [21630.929112] getname_flags+0x80/0xd8 [21630.929557] vfs_fstatat+0xc8/0xe0 [21630.929975] __do_sys_newfstatat+0xa0/0x100 [21630.930469] __arm64_sys_newfstatat+0x90/0xd8 [21630.931046] invoke_syscall+0xd4/0x258 [21630.931685] el0_svc_common.constprop.0+0xb4/0x240 [21630.932467] do_el0_svc+0x48/0x68 [21630.932972] el0_svc+0x40/0xe0 [21630.933472] el0t_64_sync_handler+0xa0/0xe8 [21630.934151] el0t_64_sync+0x1ac/0x1b0 [21630.934923] Code: aa1803e0 97ffef2b a9446bf9 17ffff9c (d4210000) [21630.936461] SMP: stopping secondary CPUs [21630.939550] Starting crashdump kernel... [21630.940108] Bye! Link: https://lkml.kernel.org/r/20251029014317.1533488-1-hao.ge@linux.dev Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations") Signed-off-by: Hao Ge <gehao@kylinos.cn> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: David Rientjes <rientjes@google.com> Cc: gehao <gehao@kylinos.cn> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Typically copynotify stateid is freed either when parent's stateid is being close/freed or in nfsd4_laundromat if the stateid hasn't been used in a lease period. However, in case when the server got an OPEN (which created a parent stateid), followed by a COPY_NOTIFY using that stateid, followed by a client reboot. New client instance while doing CREATE_SESSION would force expire previous state of this client. It leads to the open state being freed thru release_openowner-> nfs4_free_ol_stateid() and it finds that it still has copynotify stateid associated with it. We currently print a warning and is triggerred WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd] This patch, instead, frees the associated copynotify stateid here. If the parent stateid is freed (without freeing the copynotify stateids associated with it), it leads to the list corruption when laundromat ends up freeing the copynotify state later. [ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink [ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary) [ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN [ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd] [ 1626.859304] pstate: 6140000 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.861182] sp : ffff8000881d7a40 [ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200 [ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20 [ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8 [ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000 [ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065 [ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3 [ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000 [ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001 [ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000 [ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d [ 1626.868167] Call trace: [ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P) [ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd] [ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd] [ 1626.869813] laundromat_main+0x24/0x60 [nfsd] [ 1626.870231] process_one_work+0x584/0x1050 [ 1626.870595] worker_thread+0x4c4/0xc60 [ 1626.870893] kthread+0x2f8/0x398 [ 1626.871146] ret_from_fork+0x10/0x20 [ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000) [ 1626.871892] SMP: stopping secondary CPUs Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t Fixes: 624322f ("NFSD add COPY_NOTIFY operation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
When freeing indexed arrays, the corresponding free function should
be called for each entry of the indexed array. For example, for
for 'struct tc_act_attrs' 'tc_act_attrs_free(...)' needs to be called
for each entry.
Previously, memory leaks were reported when enabling the ASAN
analyzer.
=================================================================
==874==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db048af in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
#2 0x55c98db048af in main ./linux/tools/net/ynl/samples/tc-filter-add.c:71
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db04a93 in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
#2 0x55c98db04a93 in main ./linux/tools/net/ynl/samples/tc-filter-add.c:74
Direct leak of 10 byte(s) in 2 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db0527d in tc_act_attrs_set_kind ../generated/tc-user.h:1622
SUMMARY: AddressSanitizer: 58 byte(s) leaked in 4 allocation(s).
The following diff illustrates the changes introduced compared to the
previous version of the code.
void tc_flower_attrs_free(struct tc_flower_attrs *obj)
{
+ unsigned int i;
+
free(obj->indev);
+ for (i = 0; i < obj->_count.act; i++)
+ tc_act_attrs_free(&obj->act[i]);
free(obj->act);
free(obj->key_eth_dst);
free(obj->key_eth_dst_mask);
Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251106151529.453026-3-zahari.doychev@linux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When crashkernel is configured with a high reservation, shrinking its value below the low crashkernel reservation causes two issues: 1. Invalid crashkernel resource objects 2. Kernel crash if crashkernel shrinking is done twice For example, with crashkernel=200M,high, the kernel reserves 200MB of high memory and some default low memory (say 256MB). The reservation appears as: cat /proc/iomem | grep -i crash af000000-beffffff : Crash kernel 433000000-43f7fffff : Crash kernel If crashkernel is then shrunk to 50MB (echo 52428800 > /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: af000000-beffffff : Crash kernel Instead, it should show 50MB: af000000-b21fffff : Crash kernel Further shrinking crashkernel to 40MB causes a kernel crash with the following trace (x86): BUG: kernel NULL pointer dereference, address: 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI <snip...> Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? __release_resource+0xd/0xb0 release_resource+0x26/0x40 __crash_shrink_memory+0xe5/0x110 crash_shrink_memory+0x12a/0x190 kexec_crash_size_store+0x41/0x80 kernfs_fop_write_iter+0x141/0x1f0 vfs_write+0x294/0x460 ksys_write+0x6d/0xf0 <snip...> This happens because __crash_shrink_memory()/kernel/crash_core.c incorrectly updates the crashk_res resource object even when crashk_low_res should be updated. Fix this by ensuring the correct crashkernel resource object is updated when shrinking crashkernel memory. Link: https://lkml.kernel.org/r/20251101193741.289252-1-sourabhjain@linux.ibm.com Fixes: 16c6006 ("kexec: enable kexec_crash_size to support two crash kernel regions") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When emulating an nvme device on qemu with both logical_block_size and physical_block_size set to 8 KiB, but without format, a kernel panic was triggered during the early boot stage while attempting to mount a vfat filesystem. [95553.682035] EXT4-fs (nvme0n1): unable to set blocksize [95553.684326] EXT4-fs (nvme0n1): unable to set blocksize [95553.686501] EXT4-fs (nvme0n1): unable to set blocksize [95553.696448] ISOFS: unsupported/invalid hardware sector size 8192 [95553.697117] ------------[ cut here ]------------ [95553.697567] kernel BUG at fs/buffer.c:1582! [95553.697984] Oops: invalid opcode: 0000 [#1] SMP NOPTI [95553.698602] CPU: 0 UID: 0 PID: 7212 Comm: mount Kdump: loaded Not tainted 6.18.0-rc2+ torvalds#38 PREEMPT(voluntary) [95553.699511] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [95553.700534] RIP: 0010:folio_alloc_buffers+0x1bb/0x1c0 [95553.701018] Code: 48 8b 15 e8 93 18 02 65 48 89 35 e0 93 18 02 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b 90 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f [95553.702648] RSP: 0018:ffffd1b0c676f990 EFLAGS: 00010246 [95553.703132] RAX: ffff8cfc4176d820 RBX: 0000000000508c48 RCX: 0000000000000001 [95553.703805] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000000000000 [95553.704481] RBP: ffffd1b0c676f9c8 R08: 0000000000000000 R09: 0000000000000000 [95553.705148] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [95553.705816] R13: 0000000000002000 R14: fffff8bc8257e800 R15: 0000000000000000 [95553.706483] FS: 000072ee77315840(0000) GS:ffff8cfdd2c8d000(0000) knlGS:0000000000000000 [95553.707248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95553.707782] CR2: 00007d8f2a9e5a20 CR3: 0000000039d0c006 CR4: 0000000000772ef0 [95553.708439] PKRU: 55555554 [95553.708734] Call Trace: [95553.709015] <TASK> [95553.709266] __getblk_slow+0xd2/0x230 [95553.709641] ? find_get_block_common+0x8b/0x530 [95553.710084] bdev_getblk+0x77/0xa0 [95553.710449] __bread_gfp+0x22/0x140 [95553.710810] fat_fill_super+0x23a/0xfc0 [95553.711216] ? __pfx_setup+0x10/0x10 [95553.711580] ? __pfx_vfat_fill_super+0x10/0x10 [95553.712014] vfat_fill_super+0x15/0x30 [95553.712401] get_tree_bdev_flags+0x141/0x1e0 [95553.712817] get_tree_bdev+0x10/0x20 [95553.713177] vfat_get_tree+0x15/0x20 [95553.713550] vfs_get_tree+0x2a/0x100 [95553.713910] vfs_cmd_create+0x62/0xf0 [95553.714273] __do_sys_fsconfig+0x4e7/0x660 [95553.714669] __x64_sys_fsconfig+0x20/0x40 [95553.715062] x64_sys_call+0x21ee/0x26a0 [95553.715453] do_syscall_64+0x80/0x670 [95553.715816] ? __fs_parse+0x65/0x1e0 [95553.716172] ? fat_parse_param+0x103/0x4b0 [95553.716587] ? vfs_parse_fs_param_source+0x21/0xa0 [95553.717034] ? __do_sys_fsconfig+0x3d9/0x660 [95553.717548] ? __x64_sys_fsconfig+0x20/0x40 [95553.717957] ? x64_sys_call+0x21ee/0x26a0 [95553.718360] ? do_syscall_64+0xb8/0x670 [95553.718734] ? __x64_sys_fsconfig+0x20/0x40 [95553.719141] ? x64_sys_call+0x21ee/0x26a0 [95553.719545] ? do_syscall_64+0xb8/0x670 [95553.719922] ? x64_sys_call+0x1405/0x26a0 [95553.720317] ? do_syscall_64+0xb8/0x670 [95553.720702] ? __x64_sys_close+0x3e/0x90 [95553.721080] ? x64_sys_call+0x1b5e/0x26a0 [95553.721478] ? do_syscall_64+0xb8/0x670 [95553.721841] ? irqentry_exit+0x43/0x50 [95553.722211] ? exc_page_fault+0x90/0x1b0 [95553.722681] entry_SYSCALL_64_after_hwframe+0x76/0x7e [95553.723166] RIP: 0033:0x72ee774f3afe [95553.723562] Code: 73 01 c3 48 8b 0d 0a 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 49 89 ca b8 af 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 32 0f 00 f7 d8 64 89 01 48 [95553.725188] RSP: 002b:00007ffe97148978 EFLAGS: 00000246 ORIG_RAX: 00000000000001af [95553.725892] RAX: ffffffffffffffda RBX: 00005dcfe53d0080 RCX: 000072ee774f3afe [95553.726526] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 [95553.727176] RBP: 00007ffe97148ac0 R08: 0000000000000000 R09: 000072ee775e7ac0 [95553.727818] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [95553.728459] R13: 00005dcfe53d04b0 R14: 000072ee77670b00 R15: 00005dcfe53d1a28 [95553.729086] </TASK> The panic occurs as follows: 1. logical_block_size is 8KiB, causing {struct super_block *sb}->s_blocksize is initialized to 0. vfat_fill_super - fat_fill_super - sb_min_blocksize - sb_set_blocksize //return 0 when size is 8KiB. 2. __bread_gfp is called with size == 0, causing folio_alloc_buffers() to compute an offset equal to folio_size(folio), which triggers a BUG_ON. fat_fill_super - sb_bread - __bread_gfp // size == {struct super_block *sb}->s_blocksize == 0 - bdev_getblk - __getblk_slow - grow_buffers - grow_dev_folio - folio_alloc_buffers // size == 0 - folio_set_bh //offset == folio_size(folio) and panic To fix this issue, add proper return value checks for sb_min_blocksize(). Cc: stable@vger.kernel.org # v6.15 Fixes: a64e5a5 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()") Reviewed-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Link: https://patch.msgid.link/20251104125009.2111925-2-yangyongpeng.storage@gmail.com Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Christian Brauner <brauner@kernel.org>
If the allocation of tl_hba->sh fails in tcm_loop_driver_probe() and we attempt to dereference it in tcm_loop_tpg_address_show() we will get a segfault, see below for an example. So, check tl_hba->sh before dereferencing it. Unable to allocate struct scsi_host BUG: kernel NULL pointer dereference, address: 0000000000000194 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 8356 Comm: tokio-runtime-w Not tainted 6.6.104.2-4.azl3 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024 RIP: 0010:tcm_loop_tpg_address_show+0x2e/0x50 [tcm_loop] ... Call Trace: <TASK> configfs_read_iter+0x12d/0x1d0 [configfs] vfs_read+0x1b5/0x300 ksys_read+0x6f/0xf0 ... Cc: stable@vger.kernel.org Fixes: 2628b35 ("tcm_loop: Show address of tpg in configfs") Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Allen Pais <apais@linux.microsoft.com> Link: https://patch.msgid.link/1762370746-6304-1-git-send-email-hamzamahfooz@linux.microsoft.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The kernel test has reported: BUG: unable to handle page fault for address: fffba000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pde = 03171067 *pte = 00000000 Oops: Oops: 0002 [#1] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.18.0-rc2-00031-gec7f31b2a2d3 #1 NONE a1d066dfe789f54bc7645c7989957d2bdee593ca Tainted: [T]=RANDSTRUCT Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 EIP: memset (arch/x86/include/asm/string_32.h:168 arch/x86/lib/memcpy_32.c:17) Code: a5 8b 4d f4 83 e1 03 74 02 f3 a4 83 c4 04 5e 5f 5d 2e e9 73 41 01 00 90 90 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 89 d0 89 f7 <f3> aa 89 f0 5e 5f 5d 2e e9 53 41 01 00 cc cc cc 55 89 e5 53 57 56 EAX: 0000006b EBX: 00000015 ECX: 001fefff EDX: 0000006b ESI: fffb9000 EDI: fffba000 EBP: c611fbf0 ESP: c611fbe8 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010287 CR0: 80050033 CR2: fffba000 CR3: 0316e000 CR4: 00040690 Call Trace: poison_element (mm/mempool.c:83 mm/mempool.c:102) mempool_init_node (mm/mempool.c:142 mm/mempool.c:226) mempool_init_noprof (mm/mempool.c:250 (discriminator 1)) ? mempool_alloc_pages (mm/mempool.c:640) bio_integrity_initfn (block/bio-integrity.c:483 (discriminator 8)) ? mempool_alloc_pages (mm/mempool.c:640) do_one_initcall (init/main.c:1283) Christoph found out this is due to the poisoning code not dealing properly with CONFIG_HIGHMEM because only the first page is mapped but then the whole potentially high-order page is accessed. We could give up on HIGHMEM here, but it's straightforward to fix this with a loop that's mapping, poisoning or checking and unmapping individual pages. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511111411.9ebfa1ba-lkp@intel.com Analyzed-by: Christoph Hellwig <hch@lst.de> Fixes: bdfedb7 ("mm, mempool: poison elements backed by slab allocator") Cc: stable@vger.kernel.org Tested-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251113-mempool-poison-v1-1-233b3ef984c3@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The validation of the set(nsh(...)) action is completely wrong. It runs through the nsh_key_put_from_nlattr() function that is the same function that validates NSH keys for the flow match and the push_nsh() action. However, the set(nsh(...)) has a very different memory layout. Nested attributes in there are doubled in size in case of the masked set(). That makes proper validation impossible. There is also confusion in the code between the 'masked' flag, that says that the nested attributes are doubled in size containing both the value and the mask, and the 'is_mask' that says that the value we're parsing is the mask. This is causing kernel crash on trying to write into mask part of the match with SW_FLOW_KEY_PUT() during validation, while validate_nsh() doesn't allocate any memory for it: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ torvalds#107 PREEMPT(voluntary) RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch] Call Trace: <TASK> validate_nsh+0x60/0x90 [openvswitch] validate_set.constprop.0+0x270/0x3c0 [openvswitch] __ovs_nla_copy_actions+0x477/0x860 [openvswitch] ovs_nla_copy_actions+0x8d/0x100 [openvswitch] ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch] genl_family_rcv_msg_doit+0xdb/0x130 genl_family_rcv_msg+0x14b/0x220 genl_rcv_msg+0x47/0xa0 netlink_rcv_skb+0x53/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x280/0x3b0 netlink_sendmsg+0x1f7/0x430 ____sys_sendmsg+0x36b/0x3a0 ___sys_sendmsg+0x87/0xd0 __sys_sendmsg+0x6d/0xd0 do_syscall_64+0x7b/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The third issue with this process is that while trying to convert the non-masked set into masked one, validate_set() copies and doubles the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested attributes. It should be copying each nested attribute and doubling them in size independently. And the process must be properly reversed during the conversion back from masked to a non-masked variant during the flow dump. In the end, the only two outcomes of trying to use this action are either validation failure or a kernel crash. And if somehow someone manages to install a flow with such an action, it will most definitely not do what it is supposed to, since all the keys and the masks are mixed up. Fixing all the issues is a complex task as it requires re-writing most of the validation code. Given that and the fact that this functionality never worked since introduction, let's just remove it altogether. It's better to re-introduce it later with a proper implementation instead of trying to fix it in stable releases. Fixes: b2d0f5d ("openvswitch: enable NSH support") Reported-by: Junvy Yang <zhuque@tencent.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nvme_fc_delete_assocation() waits for pending I/O to complete before returning, and an error can cause ->ioerr_work to be queued after cancel_work_sync() had been called. Move the call to cancel_work_sync() to be after nvme_fc_delete_association() to ensure ->ioerr_work is not running when the nvme_fc_ctrl object is freed. Otherwise the following can occur: [ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL [ 1135.917705] ------------[ cut here ]------------ [ 1135.922336] kernel BUG at lib/list_debug.c:52! [ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary) [ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025 [ 1135.950969] Workqueue: 0x0 (nvme-wq) [ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f [ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b [ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046 [ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000 [ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0 [ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08 [ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100 [ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0 [ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000 [ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0 [ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1136.055910] PKRU: 55555554 [ 1136.058623] Call Trace: [ 1136.061074] <TASK> [ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.071898] ? move_linked_works+0x4a/0xa0 [ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.081744] ? __die_body.cold+0x8/0x12 [ 1136.085584] ? die+0x2e/0x50 [ 1136.088469] ? do_trap+0xca/0x110 [ 1136.091789] ? do_error_trap+0x65/0x80 [ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.101289] ? exc_invalid_op+0x50/0x70 [ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20 [ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.120806] move_linked_works+0x4a/0xa0 [ 1136.124733] worker_thread+0x216/0x3a0 [ 1136.128485] ? __pfx_worker_thread+0x10/0x10 [ 1136.132758] kthread+0xfa/0x240 [ 1136.135904] ? __pfx_kthread+0x10/0x10 [ 1136.139657] ret_from_fork+0x31/0x50 [ 1136.143236] ? __pfx_kthread+0x10/0x10 [ 1136.146988] ret_from_fork_asm+0x1a/0x30 [ 1136.150915] </TASK> Fixes: 19fce04 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Cc: stable@vger.kernel.org Tested-by: Marco Patalano <mpatalan@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
The function devl_rate_nodes_destroy is documented to "Unset parent for all rate objects". However, it was only calling the driver-specific `rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing the parent's refcount, without actually setting the `devlink_rate->parent` pointer to NULL. This leaves a dangling pointer in the `devlink_rate` struct, which cause refcount error in netdevsim[1] and mlx5[2]. In addition, this is inconsistent with the behavior of `devlink_nl_rate_parent_node_set`, where the parent pointer is correctly cleared. This patch fixes the issue by explicitly setting `devlink_rate->parent` to NULL after notifying the driver, thus fulfilling the function's documented behavior for all rate objects. [1] repro steps: echo 1 > /sys/bus/netdevsim/new_device devlink dev eswitch set netdevsim/netdevsim1 mode switchdev echo 1 > /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs devlink port function rate add netdevsim/netdevsim1/test_node devlink port function rate set netdevsim/netdevsim1/128 parent test_node echo 1 > /sys/bus/netdevsim/del_device dmesg: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0 CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0x42/0xe0 Call Trace: <TASK> devl_rate_leaf_destroy+0x8d/0x90 __nsim_dev_port_del+0x6c/0x70 [netdevsim] nsim_dev_reload_destroy+0x11c/0x140 [netdevsim] nsim_drv_remove+0x2b/0xb0 [netdevsim] device_release_driver_internal+0x194/0x1f0 bus_remove_device+0xc6/0x130 device_del+0x159/0x3c0 device_unregister+0x1a/0x60 del_device_store+0x111/0x170 [netdevsim] kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x55/0x10f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000 devlink port function rate add pci/0000:08:00.0/group1 devlink port function rate set pci/0000:08:00.0/32768 parent group1 modprobe -r mlx5_ib mlx5_fwctl mlx5_core dmesg: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0 CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0x42/0xe0 Call Trace: <TASK> devl_rate_leaf_destroy+0x8d/0x90 mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core] mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core] mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core] mlx5_sf_esw_event+0xc4/0x120 [mlx5_core] notifier_call_chain+0x33/0xa0 blocking_notifier_call_chain+0x3b/0x50 mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core] mlx5_eswitch_disable+0x63/0x90 [mlx5_core] mlx5_unload+0x1d/0x170 [mlx5_core] mlx5_uninit_one+0xa2/0x130 [mlx5_core] remove_one+0x78/0xd0 [mlx5_core] pci_device_remove+0x39/0xa0 device_release_driver_internal+0x194/0x1f0 unbind_store+0x99/0xa0 kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x53/0x1f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: d755598 ("devlink: Allow setting parent node of rate objects") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763381149-1234377-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mlx5_irq_alloc() function can inadvertently free the entire rmap and end up in a crash[1] when the other threads tries to access this, when request_irq() fails due to exhausted IRQ vectors. This commit modifies the cleanup to remove only the specific IRQ mapping that was just added. This prevents removal of other valid mappings and ensures precise cleanup of the failed IRQ allocation's associated glue object. Note: This error is observed when both fwctl and rds configs are enabled. [1] mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 general protection fault, probably for non-canonical address 0xe277a58fde16f291: 0000 [#1] SMP NOPTI RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Call Trace: <TASK> ? show_trace_log_lvl+0x1d6/0x2f9 ? show_trace_log_lvl+0x1d6/0x2f9 ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] ? __die_body.cold+0x8/0xa ? die_addr+0x39/0x53 ? exc_general_protection+0x1c4/0x3e9 ? dev_vprintk_emit+0x5f/0x90 ? asm_exc_general_protection+0x22/0x27 ? free_irq_cpu_rmap+0x23/0x7d mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] irq_pool_request_vector+0x7d/0x90 [mlx5_core] mlx5_irq_request+0x2e/0xe0 [mlx5_core] mlx5_irq_request_vector+0xad/0xf7 [mlx5_core] comp_irq_request_pci+0x64/0xf0 [mlx5_core] create_comp_eq+0x71/0x385 [mlx5_core] ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core] mlx5_comp_eqn_get+0x72/0x90 [mlx5_core] ? xas_load+0x8/0x91 mlx5_comp_irqn_get+0x40/0x90 [mlx5_core] mlx5e_open_channel+0x7d/0x3c7 [mlx5_core] mlx5e_open_channels+0xad/0x250 [mlx5_core] mlx5e_open_locked+0x3e/0x110 [mlx5_core] mlx5e_open+0x23/0x70 [mlx5_core] __dev_open+0xf1/0x1a5 __dev_change_flags+0x1e1/0x249 dev_change_flags+0x21/0x5c do_setlink+0x28b/0xcc4 ? __nla_parse+0x22/0x3d ? inet6_validate_link_af+0x6b/0x108 ? cpumask_next+0x1f/0x35 ? __snmp6_fill_stats64.constprop.0+0x66/0x107 ? __nla_validate_parse+0x48/0x1e6 __rtnl_newlink+0x5ff/0xa57 ? kmem_cache_alloc_trace+0x164/0x2ce rtnl_newlink+0x44/0x6e rtnetlink_rcv_msg+0x2bb/0x362 ? __netlink_sendskb+0x4c/0x6c ? netlink_unicast+0x28f/0x2ce ? rtnl_calcit.isra.0+0x150/0x146 netlink_rcv_skb+0x5f/0x112 netlink_unicast+0x213/0x2ce netlink_sendmsg+0x24f/0x4d9 __sock_sendmsg+0x65/0x6a ____sys_sendmsg+0x28f/0x2c9 ? import_iovec+0x17/0x2b ___sys_sendmsg+0x97/0xe0 __sys_sendmsg+0x81/0xd8 do_syscall_64+0x35/0x87 entry_SYSCALL_64_after_hwframe+0x6e/0x0 RIP: 0033:0x7fc328603727 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727 RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc </TASK> ---[ end trace f43ce73c3c2b13a2 ]--- RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31 f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff RSP: 0018:ff384881640eaca0 EFLAGS: 00010282 RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400 RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88 R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8 FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) kvm-guest: disable async PF for cpu 0 Fixes: 3354822 ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763381768-1234998-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
A synchronous external abort occurs on the Renesas RZ/G3S SoC if unbind is
executed after the configuration sequence described above:
modprobe usb_f_ecm
modprobe libcomposite
modprobe configfs
cd /sys/kernel/config/usb_gadget
mkdir -p g1
cd g1
echo "0x1d6b" > idVendor
echo "0x0104" > idProduct
mkdir -p strings/0x409
echo "0123456789" > strings/0x409/serialnumber
echo "Renesas." > strings/0x409/manufacturer
echo "Ethernet Gadget" > strings/0x409/product
mkdir -p functions/ecm.usb0
mkdir -p configs/c.1
mkdir -p configs/c.1/strings/0x409
echo "ECM" > configs/c.1/strings/0x409/configuration
if [ ! -L configs/c.1/ecm.usb0 ]; then
ln -s functions/ecm.usb0 configs/c.1
fi
echo 11e20000.usb > UDC
echo 11e20000.usb > /sys/bus/platform/drivers/renesas_usbhs/unbind
The displayed trace is as follows:
Internal error: synchronous external abort: 0000000096000010 [#1] SMP
CPU: 0 UID: 0 PID: 188 Comm: sh Tainted: G M 6.17.0-rc7-next-20250922-00010-g41050493b2bd torvalds#55 PREEMPT
Tainted: [M]=MACHINE_CHECK
Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs]
lr : usbhsg_update_pullup+0x3c/0x68 [renesas_usbhs]
sp : ffff8000838b3920
x29: ffff8000838b3920 x28: ffff00000d585780 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000c3e3810
x23: ffff00000d5e5c80 x22: ffff00000d5e5d40 x21: 0000000000000000
x20: 0000000000000000 x19: ffff00000d5e5c80 x18: 0000000000000020
x17: 2e30303230316531 x16: 312d7968703a7968 x15: 3d454d414e5f4344
x14: 000000000000002c x13: 0000000000000000 x12: 0000000000000000
x11: ffff00000f358f38 x10: ffff00000f358db0 x9 : ffff00000b41f418
x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
x5 : 8080808000000000 x4 : 000000004b5ccb9d x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff800083790000 x0 : ffff00000d5e5c80
Call trace:
usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs] (P)
usbhsg_pullup+0x4c/0x7c [renesas_usbhs]
usb_gadget_disconnect_locked+0x48/0xd4
gadget_unbind_driver+0x44/0x114
device_remove+0x4c/0x80
device_release_driver_internal+0x1c8/0x224
device_release_driver+0x18/0x24
bus_remove_device+0xcc/0x10c
device_del+0x14c/0x404
usb_del_gadget+0x88/0xc0
usb_del_gadget_udc+0x18/0x30
usbhs_mod_gadget_remove+0x24/0x44 [renesas_usbhs]
usbhs_mod_remove+0x20/0x30 [renesas_usbhs]
usbhs_remove+0x98/0xdc [renesas_usbhs]
platform_remove+0x20/0x30
device_remove+0x4c/0x80
device_release_driver_internal+0x1c8/0x224
device_driver_detach+0x18/0x24
unbind_store+0xb4/0xb8
drv_attr_store+0x24/0x38
sysfs_kf_write+0x7c/0x94
kernfs_fop_write_iter+0x128/0x1b8
vfs_write+0x2ac/0x350
ksys_write+0x68/0xfc
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xf0
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
Code: 7100003f 1a9f07e1 531c6c22 f9400001 (79400021)
---[ end trace 0000000000000000 ]---
note: sh[188] exited with irqs disabled
note: sh[188] exited with preempt_count 1
The issue occurs because usbhs_sys_function_pullup(), which accesses the IP
registers, is executed after the USBHS clocks have been disabled. The
problem is reproducible on the Renesas RZ/G3S SoC starting with the
addition of module stop in the clock enable/disable APIs. With module stop
functionality enabled, a bus error is expected if a master accesses a
module whose clock has been stopped and module stop activated.
Disable the IP clocks at the end of remove.
Cc: stable <stable@kernel.org>
Fixes: f1407d5 ("usb: renesas_usbhs: Add Renesas USBHS common code")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251027140741.557198-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit d02c2e4. Mauro reports that this breaks APEI notifications on his QEMU setup because the "reserved for firmware" region still needs to be writable by Linux in order to signal _back_ to the firmware after processing the reported error: | {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 | ... | [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error | Unable to handle kernel write to read-only memory at virtual address ffff800080035018 | Mem abort info: | ESR = 0x000000009600004f | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x0f: level 3 permission fault | Data abort info: | ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000 | CM = 0, WnR = 1, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000 | pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483 | Internal error: Oops: 000000009600004f [#1] SMP For now, revert the offending commit. We can presumably switch back to PAGE_KERNEL when bringing this back in the future. Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Since commit 30f241f ("xsk: Fix immature cq descriptor production"), the descriptor number is stored in skb control block and xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto pool's completion queue. skb control block shouldn't be used for this purpose as after transmit xsk doesn't have control over it and other subsystems could use it. This leads to the following kernel panic due to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy) Debian 6.16.12-1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:xsk_destruct_skb+0xd0/0x180 [...] Call Trace: <IRQ> ? napi_complete_done+0x7a/0x1a0 ip_rcv_core+0x1bb/0x340 ip_rcv+0x30/0x1f0 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0x87/0x130 __napi_poll+0x28/0x180 net_rx_action+0x339/0x420 handle_softirqs+0xdc/0x320 ? handle_edge_irq+0x90/0x1e0 do_softirq.part.0+0x3b/0x60 </IRQ> <TASK> __local_bh_enable_ip+0x60/0x70 __dev_direct_xmit+0x14e/0x1f0 __xsk_generic_xmit+0x482/0xb70 ? __remove_hrtimer+0x41/0xa0 ? __xsk_generic_xmit+0x51/0xb70 ? _raw_spin_unlock_irqrestore+0xe/0x40 xsk_sendmsg+0xda/0x1c0 __sys_sendto+0x1ee/0x200 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x84/0x2f0 ? __pfx_pollwake+0x10/0x10 ? __rseq_handle_notify_resume+0xad/0x4c0 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> [...] Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Instead use the skb destructor_arg pointer along with pointer tagging. As pointers are always aligned to 8B, use the bottom bit to indicate whether this a single address or an allocated struct containing several addresses. Fixes: 30f241f ("xsk: Fix immature cq descriptor production") Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[WHAT] IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic fails with NULL pointer dereference. This can be reproduced with both an eDP panel and a DP monitors connected. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted 6.16.0-99-custom #8 PREEMPT(voluntary) Hardware name: AMD ........ RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu] Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49 89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30 c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02 RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668 RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000 RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760 R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000 R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c FS: 000071f631b68700(0000) GS:ffff8b399f114000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu] amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu] ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu] amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu] drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400 drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30 drm_crtc_get_last_vbltimestamp+0x55/0x90 drm_crtc_next_vblank_start+0x45/0xa0 drm_atomic_helper_wait_for_fences+0x81/0x1f0 ... Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 621e55f) Cc: stable@vger.kernel.org
The crash in process_v2_sparse_read() for fscrypt-encrypted directories has been reported. Issue takes place for Ceph msgr2 protocol in secure mode. It can be reproduced by the steps: sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure (1) mkdir /mnt/cephfs/fscrypt-test-3 (2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3 (3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3 (4) fscrypt lock /mnt/cephfs/fscrypt-test-3 (5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3 (6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar (7) Issue has been triggered [ 408.072247] ------------[ cut here ]------------ [ 408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865 ceph_con_v2_try_read+0x4b39/0x72f0 [ 408.072267] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+ [ 408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 408.072310] Workqueue: ceph-msgr ceph_con_workfn [ 408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0 [ 408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8 8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06 fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85 [ 408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246 [ 408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38 [ 408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8 [ 408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8 [ 408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000 [ 408.072329] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.072331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.072336] PKRU: 55555554 [ 408.072337] Call Trace: [ 408.072338] <TASK> [ 408.072340] ? sched_clock_noinstr+0x9/0x10 [ 408.072344] ? __pfx_ceph_con_v2_try_read+0x10/0x10 [ 408.072347] ? _raw_spin_unlock+0xe/0x40 [ 408.072349] ? finish_task_switch.isra.0+0x15d/0x830 [ 408.072353] ? __kasan_check_write+0x14/0x30 [ 408.072357] ? mutex_lock+0x84/0xe0 [ 408.072359] ? __pfx_mutex_lock+0x10/0x10 [ 408.072361] ceph_con_workfn+0x27e/0x10e0 [ 408.072364] ? metric_delayed_work+0x311/0x2c50 [ 408.072367] process_one_work+0x611/0xe20 [ 408.072371] ? __kasan_check_write+0x14/0x30 [ 408.072373] worker_thread+0x7e3/0x1580 [ 408.072375] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 408.072378] ? __pfx_worker_thread+0x10/0x10 [ 408.072381] kthread+0x381/0x7a0 [ 408.072383] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 408.072385] ? __pfx_kthread+0x10/0x10 [ 408.072387] ? __kasan_check_write+0x14/0x30 [ 408.072389] ? recalc_sigpending+0x160/0x220 [ 408.072392] ? _raw_spin_unlock_irq+0xe/0x50 [ 408.072394] ? calculate_sigpending+0x78/0xb0 [ 408.072395] ? __pfx_kthread+0x10/0x10 [ 408.072397] ret_from_fork+0x2b6/0x380 [ 408.072400] ? __pfx_kthread+0x10/0x10 [ 408.072402] ret_from_fork_asm+0x1a/0x30 [ 408.072406] </TASK> [ 408.072407] ---[ end trace 0000000000000000 ]--- [ 408.072418] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI [ 408.072984] KASAN: null-ptr-deref in range [0x0000000000000000- 0x0000000000000007] [ 408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G W 6.17.0-rc7+ #1 PREEMPT(voluntary) [ 408.073886] Tainted: [W]=WARN [ 408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 408.074468] Workqueue: ceph-msgr ceph_con_workfn [ 408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80 [ 408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03 [ 408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246 [ 408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000 [ 408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378 [ 408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8 [ 408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71 [ 408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378 [ 408.079159] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.079736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.080376] PKRU: 55555554 [ 408.080513] Call Trace: [ 408.080630] <TASK> [ 408.080729] ceph_con_v2_try_read+0x49b9/0x72f0 [ 408.081115] ? __pfx_ceph_con_v2_try_read+0x10/0x10 [ 408.081348] ? _raw_spin_unlock+0xe/0x40 [ 408.081538] ? finish_task_switch.isra.0+0x15d/0x830 [ 408.081768] ? __kasan_check_write+0x14/0x30 [ 408.081986] ? mutex_lock+0x84/0xe0 [ 408.082160] ? __pfx_mutex_lock+0x10/0x10 [ 408.082343] ceph_con_workfn+0x27e/0x10e0 [ 408.082529] ? metric_delayed_work+0x311/0x2c50 [ 408.082737] process_one_work+0x611/0xe20 [ 408.082948] ? __kasan_check_write+0x14/0x30 [ 408.083156] worker_thread+0x7e3/0x1580 [ 408.083331] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 408.083557] ? __pfx_worker_thread+0x10/0x10 [ 408.083751] kthread+0x381/0x7a0 [ 408.083922] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 408.084139] ? __pfx_kthread+0x10/0x10 [ 408.084310] ? __kasan_check_write+0x14/0x30 [ 408.084510] ? recalc_sigpending+0x160/0x220 [ 408.084708] ? _raw_spin_unlock_irq+0xe/0x50 [ 408.084917] ? calculate_sigpending+0x78/0xb0 [ 408.085138] ? __pfx_kthread+0x10/0x10 [ 408.085335] ret_from_fork+0x2b6/0x380 [ 408.085525] ? __pfx_kthread+0x10/0x10 [ 408.085720] ret_from_fork_asm+0x1a/0x30 [ 408.085922] </TASK> [ 408.086036] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 408.087778] ---[ end trace 0000000000000000 ]--- [ 408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80 [ 408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03 [ 408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246 [ 408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000 [ 408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378 [ 408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8 [ 408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71 [ 408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378 [ 408.091035] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.091452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.092530] PKRU: 55555554 [ 417.112915] ================================================================== [ 417.113491] BUG: KASAN: slab-use-after-free in __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951 [ 417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G D W 6.17.0-rc7+ #1 PREEMPT(voluntary) [ 417.114592] Tainted: [D]=DIE, [W]=WARN [ 417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 417.114596] Workqueue: events handle_timeout [ 417.114601] Call Trace: [ 417.114602] <TASK> [ 417.114604] dump_stack_lvl+0x5c/0x90 [ 417.114610] print_report+0x171/0x4dc [ 417.114613] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 417.114617] ? kasan_complete_mode_report_info+0x80/0x220 [ 417.114621] kasan_report+0xbd/0x100 [ 417.114625] ? __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114628] ? __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114630] __asan_report_load4_noabort+0x14/0x30 [ 417.114633] __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114635] ? queue_con_delay+0x8d/0x200 [ 417.114638] ? __pfx___mutex_lock.constprop.0+0x10/0x10 [ 417.114641] ? __send_subscribe+0x529/0xb20 [ 417.114644] __mutex_lock_slowpath+0x13/0x20 [ 417.114646] mutex_lock+0xd4/0xe0 [ 417.114649] ? __pfx_mutex_lock+0x10/0x10 [ 417.114652] ? ceph_monc_renew_subs+0x2a/0x40 [ 417.114654] ceph_con_keepalive+0x22/0x110 [ 417.114656] handle_timeout+0x6b3/0x11d0 [ 417.114659] ? _raw_spin_unlock_irq+0xe/0x50 [ 417.114662] ? __pfx_handle_timeout+0x10/0x10 [ 417.114664] ? queue_delayed_work_on+0x8e/0xa0 [ 417.114669] process_one_work+0x611/0xe20 [ 417.114672] ? __kasan_check_write+0x14/0x30 [ 417.114676] worker_thread+0x7e3/0x1580 [ 417.114678] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 417.114682] ? __pfx_sched_setscheduler_nocheck+0x10/0x10 [ 417.114687] ? __pfx_worker_thread+0x10/0x10 [ 417.114689] kthread+0x381/0x7a0 [ 417.114692] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 417.114694] ? __pfx_kthread+0x10/0x10 [ 417.114697] ? __kasan_check_write+0x14/0x30 [ 417.114699] ? recalc_sigpending+0x160/0x220 [ 417.114703] ? _raw_spin_unlock_irq+0xe/0x50 [ 417.114705] ? calculate_sigpending+0x78/0xb0 [ 417.114707] ? __pfx_kthread+0x10/0x10 [ 417.114710] ret_from_fork+0x2b6/0x380 [ 417.114713] ? __pfx_kthread+0x10/0x10 [ 417.114715] ret_from_fork_asm+0x1a/0x30 [ 417.114720] </TASK> [ 417.125171] Allocated by task 2: [ 417.125333] kasan_save_stack+0x26/0x60 [ 417.125522] kasan_save_track+0x14/0x40 [ 417.125742] kasan_save_alloc_info+0x39/0x60 [ 417.125945] __kasan_slab_alloc+0x8b/0xb0 [ 417.126133] kmem_cache_alloc_node_noprof+0x13b/0x460 [ 417.126381] copy_process+0x320/0x6250 [ 417.126595] kernel_clone+0xb7/0x840 [ 417.126792] kernel_thread+0xd6/0x120 [ 417.126995] kthreadd+0x85c/0xbe0 [ 417.127176] ret_from_fork+0x2b6/0x380 [ 417.127378] ret_from_fork_asm+0x1a/0x30 [ 417.127692] Freed by task 0: [ 417.127851] kasan_save_stack+0x26/0x60 [ 417.128057] kasan_save_track+0x14/0x40 [ 417.128267] kasan_save_free_info+0x3b/0x60 [ 417.128491] __kasan_slab_free+0x6c/0xa0 [ 417.128708] kmem_cache_free+0x182/0x550 [ 417.128906] free_task+0xeb/0x140 [ 417.129070] __put_task_struct+0x1d2/0x4f0 [ 417.129259] __put_task_struct_rcu_cb+0x15/0x20 [ 417.129480] rcu_do_batch+0x3d3/0xe70 [ 417.129681] rcu_core+0x549/0xb30 [ 417.129839] rcu_core_si+0xe/0x20 [ 417.130005] handle_softirqs+0x160/0x570 [ 417.130190] __irq_exit_rcu+0x189/0x1e0 [ 417.130369] irq_exit_rcu+0xe/0x20 [ 417.130531] sysvec_apic_timer_interrupt+0x9f/0xd0 [ 417.130768] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 417.131082] Last potentially related work creation: [ 417.131305] kasan_save_stack+0x26/0x60 [ 417.131484] kasan_record_aux_stack+0xae/0xd0 [ 417.131695] __call_rcu_common+0xcd/0x14b0 [ 417.131909] call_rcu+0x31/0x50 [ 417.132071] delayed_put_task_struct+0x128/0x190 [ 417.132295] rcu_do_batch+0x3d3/0xe70 [ 417.132478] rcu_core+0x549/0xb30 [ 417.132658] rcu_core_si+0xe/0x20 [ 417.132808] handle_softirqs+0x160/0x570 [ 417.132993] __irq_exit_rcu+0x189/0x1e0 [ 417.133181] irq_exit_rcu+0xe/0x20 [ 417.133353] sysvec_apic_timer_interrupt+0x9f/0xd0 [ 417.133584] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 417.133921] Second to last potentially related work creation: [ 417.134183] kasan_save_stack+0x26/0x60 [ 417.134362] kasan_record_aux_stack+0xae/0xd0 [ 417.134566] __call_rcu_common+0xcd/0x14b0 [ 417.134782] call_rcu+0x31/0x50 [ 417.134929] put_task_struct_rcu_user+0x58/0xb0 [ 417.135143] finish_task_switch.isra.0+0x5d3/0x830 [ 417.135366] __schedule+0xd30/0x5100 [ 417.135534] schedule_idle+0x5a/0x90 [ 417.135712] do_idle+0x25f/0x410 [ 417.135871] cpu_startup_entry+0x53/0x70 [ 417.136053] start_secondary+0x216/0x2c0 [ 417.136233] common_startup_64+0x13e/0x141 [ 417.136894] The buggy address belongs to the object at ffff888124870000 which belongs to the cache task_struct of size 10504 [ 417.138122] The buggy address is located 52 bytes inside of freed 10504-byte region [ffff888124870000, ffff888124872908) [ 417.139465] The buggy address belongs to the physical page: [ 417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x124870 [ 417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 417.141519] memcg:ffff88811aa20e01 [ 417.141874] anon flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ 417.142600] page_type: f5(slab) [ 417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001 [ 417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01 [ 417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001 [ 417.144329] head: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01 [ 417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff 00000000ffffffff [ 417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 [ 417.145485] page dumped because: kasan: bad access detected [ 417.145859] Memory state around the buggy address: [ 417.146094] ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 417.146439] ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.147145] ^ [ 417.147387] ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.147751] ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.148123] ================================================================== First of all, we have warning in get_bvec_at() because cursor->total_resid contains zero value. And, finally, we have crash in ceph_msg_data_advance() because cursor->data is NULL. It means that get_bvec_at() receives not initialized ceph_msg_data_cursor structure because data is NULL and total_resid contains zero. Moreover, we don't have likewise issue for the case of Ceph msgr1 protocol because ceph_msg_data_cursor_init() has been called before reading sparse data. This patch adds calling of ceph_msg_data_cursor_init() in the beginning of process_v2_sparse_read() with the goal to guarantee that logic of reading sparse data works correctly for the case of Ceph msgr2 protocol. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/73152 Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…ptcp_do_fastclose(). syzbot reported divide-by-zero in __tcp_select_window() by MPTCP socket. [0] We had a similar issue for the bare TCP and fixed in commit 499350a ("tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0"). Let's apply the same fix to mptcp_do_fastclose(). [0]: Oops: divide error: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336 Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00 RSP: 0018:ffffc90003017640 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028 R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0 Call Trace: <TASK> tcp_select_window net/ipv4/tcp_output.c:281 [inline] __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline] tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0xe5/0x270 net/socket.c:742 __sys_sendto+0x3bd/0x520 net/socket.c:2244 __do_sys_sendto net/socket.c:2251 [inline] __se_sys_sendto net/socket.c:2247 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2247 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f66e998f749 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006 </TASK> Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose") Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
…em corrupted commit 986835b upstream. There's issue when file system corrupted: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1289! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0 RSP: 0018:ffff888117aafa30 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534 RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010 RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0 Call Trace: <TASK> __ext4_journal_get_create_access+0x42/0x170 ext4_getblk+0x319/0x6f0 ext4_bread+0x11/0x100 ext4_append+0x1e6/0x4a0 ext4_init_new_dir+0x145/0x1d0 ext4_mkdir+0x326/0x920 vfs_mkdir+0x45c/0x740 do_mkdirat+0x234/0x2f0 __x64_sys_mkdir+0xd6/0x120 do_syscall_64+0x5f/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The above issue occurs with us in errors=continue mode when accompanied by storage failures. There have been many inconsistencies in the file system data. In the case of file system data inconsistency, for example, if the block bitmap of a referenced block is not set, it can lead to the situation where a block being committed is allocated and used again. As a result, the following condition will not be satisfied then trigger BUG_ON. Of course, it is entirely possible to construct a problematic image that can trigger this BUG_ON through specific operations. In fact, I have constructed such an image and easily reproduced this issue. Therefore, J_ASSERT() holds true only under ideal conditions, but it may not necessarily be satisfied in exceptional scenarios. Using J_ASSERT() directly in abnormal situations would cause the system to crash, which is clearly not what we want. So here we directly trigger a JBD abort instead of immediately invoking BUG_ON. Fixes: 470decc ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0cd8fee upstream. Fix a race between inline data destruction and block mapping. The function ext4_destroy_inline_data_nolock() changes the inode data layout by clearing EXT4_INODE_INLINE_DATA and setting EXT4_INODE_EXTENTS. At the same time, another thread may execute ext4_map_blocks(), which tests EXT4_INODE_EXTENTS to decide whether to call ext4_ext_map_blocks() or ext4_ind_map_blocks(). Without i_data_sem protection, ext4_ind_map_blocks() may receive inode with EXT4_INODE_EXTENTS flag and triggering assert. kernel BUG at fs/ext4/indirect.c:546! EXT4-fs (loop2): unmounting filesystem. invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:ext4_ind_map_blocks.cold+0x2b/0x5a fs/ext4/indirect.c:546 Call Trace: <TASK> ext4_map_blocks+0xb9b/0x16f0 fs/ext4/inode.c:681 _ext4_get_block+0x242/0x590 fs/ext4/inode.c:822 ext4_block_write_begin+0x48b/0x12c0 fs/ext4/inode.c:1124 ext4_write_begin+0x598/0xef0 fs/ext4/inode.c:1255 ext4_da_write_begin+0x21e/0x9c0 fs/ext4/inode.c:3000 generic_perform_write+0x259/0x5d0 mm/filemap.c:3846 ext4_buffered_write_iter+0x15b/0x470 fs/ext4/file.c:285 ext4_file_write_iter+0x8e0/0x17f0 fs/ext4/file.c:679 call_write_iter include/linux/fs.h:2271 [inline] do_iter_readv_writev+0x212/0x3c0 fs/read_write.c:735 do_iter_write+0x186/0x710 fs/read_write.c:861 vfs_iter_write+0x70/0xa0 fs/read_write.c:902 iter_file_splice_write+0x73b/0xc90 fs/splice.c:685 do_splice_from fs/splice.c:763 [inline] direct_splice_actor+0x10f/0x170 fs/splice.c:950 splice_direct_to_actor+0x33a/0xa10 fs/splice.c:896 do_splice_direct+0x1a9/0x280 fs/splice.c:1002 do_sendfile+0xb13/0x12c0 fs/read_write.c:1255 __do_sys_sendfile64 fs/read_write.c:1323 [inline] __se_sys_sendfile64 fs/read_write.c:1309 [inline] __x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: c755e25 ("ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru> Message-ID: <20251104093326.697381-1-sdl@nppct.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e0ae02 upstream. Rust Binder contains the following unsafe operation: // SAFETY: A `NodeDeath` is never inserted into the death list // of any node other than its owner, so it is either in this // death list or in no death list. unsafe { node_inner.death_list.remove(self) }; This operation is unsafe because when touching the prev/next pointers of a list element, we have to ensure that no other thread is also touching them in parallel. If the node is present in the list that `remove` is called on, then that is fine because we have exclusive access to that list. If the node is not in any list, then it's also ok. But if it's present in a different list that may be accessed in parallel, then that may be a data race on the prev/next pointers. And unfortunately that is exactly what is happening here. In Node::release, we: 1. Take the lock. 2. Move all items to a local list on the stack. 3. Drop the lock. 4. Iterate the local list on the stack. Combined with threads using the unsafe remove method on the original list, this leads to memory corruption of the prev/next pointers. This leads to crashes like this one: Unable to handle kernel paging request at virtual address 000bb9841bcac70e Mem abort info: ESR = 0x0000000096000044 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [000bb9841bcac70e] address between user and kernel address ranges Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP google-cdd 538c004.gcdd: context saved(CPU:1) item - log_kevents is disabled Modules linked in: ... rust_binder CPU: 1 UID: 0 PID: 2092 Comm: kworker/1:178 Tainted: G S W OE 6.12.52-android16-5-g98debd5df505-4k #1 f94a6367396c5488d635708e43ee0c888d230b0b Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: MUSTANG PVT 1.0 based on LGA (DT) Workqueue: events _RNvXs6_NtCsdfZWD8DztAw_6kernel9workqueueINtNtNtB7_4sync3arc3ArcNtNtCs8QPsHWIn21X_16rust_binder_main7process7ProcessEINtB5_15WorkItemPointerKy0_E3runB13_ [rust_binder] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder] lr : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x464/0x11f8 [rust_binder] sp : ffffffc09b433ac0 x29: ffffffc09b433d30 x28: ffffff8821690000 x27: ffffffd40cbaa448 x26: ffffff8821690000 x25: 00000000ffffffff x24: ffffff88d0376578 x23: 0000000000000001 x22: ffffffc09b433c78 x21: ffffff88e8f9bf40 x20: ffffff88e8f9bf40 x19: ffffff882692b000 x18: ffffffd40f10bf00 x17: 00000000c006287d x16: 00000000c006287d x15: 00000000000003b0 x14: 0000000000000100 x13: 000000201cb79ae0 x12: fffffffffffffff0 x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000000 x8 : b80bb9841bcac706 x7 : 0000000000000001 x6 : fffffffebee63f30 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000004c31 x1 : ffffff88216900c0 x0 : ffffff88e8f9bf00 Call trace: _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder bbc172b53665bbc815363b22e97e3f7e3fe971fc] process_scheduled_works+0x1c4/0x45c worker_thread+0x32c/0x3e8 kthread+0x11c/0x1c8 ret_from_fork+0x10/0x20 Code: 94218d85 b4000155 a94026a8 d10102a0 (f9000509) ---[ end trace 0000000000000000 ]--- Thus, modify Node::release to pop items directly off the original list. Cc: stable@vger.kernel.org Fixes: eafedbc ("rust_binder: add Rust Binder driver") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://patch.msgid.link/20251111-binder-fix-list-remove-v1-1-8ed14a0da63d@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a51f025 upstream. Syzbot identified an issue [1] in pcl818_ai_cancel(), which stems from the fact that in case of early device detach via pcl818_detach(), subdevice dev->read_subdev may not have initialized its pointer to &struct comedi_async as intended. Thus, any such dereferencing of &s->async->cmd will lead to general protection fault and kernel crash. Mitigate this problem by removing a call to pcl818_ai_cancel() from pcl818_detach() altogether. This way, if the subdevice setups its support for async commands, everything async-related will be handled via subdevice's own ->cancel() function in comedi_device_detach_locked() even before pcl818_detach(). If no support for asynchronous commands is provided, there is no need to cancel anything either. [1] Syzbot crash: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 1 UID: 0 PID: 6050 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:pcl818_ai_cancel+0x69/0x3f0 drivers/comedi/drivers/pcl818.c:762 ... Call Trace: <TASK> pcl818_detach+0x66/0xd0 drivers/comedi/drivers/pcl818.c:1115 comedi_device_detach_locked+0x178/0x750 drivers/comedi/drivers.c:207 do_devconfig_ioctl drivers/comedi/comedi_fops.c:848 [inline] comedi_unlocked_ioctl+0xcde/0x1020 drivers/comedi/comedi_fops.c:2178 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] ... Reported-by: syzbot+fce5d9d5bd067d6fbe9b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fce5d9d5bd067d6fbe9b Fixes: 00aba6e ("staging: comedi: pcl818: remove 'neverending_ai' from private data") Cc: stable <stable@kernel.org> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Link: https://patch.msgid.link/20251023141457.398685-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
…3576 When I added command queue engine (CQE) support for the Rockchip eMMC controller, I missed that RK3576 has a separate platform data struct. While things are working fine on RK3588 (I tested the ROCK 5B) and the suspend issue is fixed on the RK3576 (I tested the Sige5), this results in stability issues. By also adding the necessary hooks for the RK3576 platform the following problems can be avoided: [ 15.606895] mmc0: running CQE recovery [ 15.616189] mmc0: running CQE recovery [...] [ 25.911484] mmc0: running CQE recovery [ 25.926305] mmc0: running CQE recovery [ 25.927468] mmc0: running CQE recovery [...] [ 26.255719] mmc0: running CQE recovery [ 26.257162] ------------[ cut here ]------------ [ 26.257581] mmc0: cqhci: spurious TCN for tag 31 [ 26.258034] WARNING: CPU: 0 PID: 0 at drivers/mmc/host/cqhci-core.c:796 cqhci_irq+0x440/0x68c [ 26.263786] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc6-gd984ebbf0d15 #1 PREEMPT [ 26.264561] Hardware name: ArmSoM Sige5 (DT) [...] [ 26.272748] Call trace: [ 26.272964] cqhci_irq+0x440/0x68c (P) [ 26.273296] dwcmshc_cqe_irq_handler+0x54/0x88 [ 26.273689] sdhci_irq+0xbc/0x1200 [ 26.273991] __handle_irq_event_percpu+0x54/0x1d0 [...] Note that the above problems do not necessarily happen with every boot. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Closes: https://lore.kernel.org/linux-rockchip/01949bc9-4873-498b-ac7d-f008393ccc4c@intel.com/ Fixes: fda1e0a ("mmc: sdhci-of-dwcmshc: Add command queue support for rockchip SOCs") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
No description provided.