forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 99
Fix typo in t8103-j293.dts and t8103-j313.dts #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Replace line 45 'secify' with 'specify' in both files. Signed-off-by: steven1lung <1030steven@gmail.com>
svenpeter42
pushed a commit
that referenced
this pull request
Feb 26, 2022
When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
svenpeter42
pushed a commit
that referenced
this pull request
Feb 26, 2022
In current async pagefault logic, when a page is ready, KVM relies on kvm_arch_can_dequeue_async_page_present() to determine whether to deliver a READY event to the Guest. This function test token value of struct kvm_vcpu_pv_apf_data, which must be reset to zero by Guest kernel when a READY event is finished by Guest. If value is zero meaning that a READY event is done, so the KVM can deliver another. But the kvm_arch_setup_async_pf() may produce a valid token with zero value, which is confused with previous mention and may lead the loss of this READY event. This bug may cause task blocked forever in Guest: INFO: task stress:7532 blocked for more than 1254 seconds. Not tainted 5.10.0 #16 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:stress state:D stack: 0 pid: 7532 ppid: 1409 flags:0x00000080 Call Trace: __schedule+0x1e7/0x650 schedule+0x46/0xb0 kvm_async_pf_task_wait_schedule+0xad/0xe0 ? exit_to_user_mode_prepare+0x60/0x70 __kvm_handle_async_pf+0x4f/0xb0 ? asm_exc_page_fault+0x8/0x30 exc_page_fault+0x6f/0x110 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x402d00 RSP: 002b:00007ffd31912500 EFLAGS: 00010206 RAX: 0000000000071000 RBX: ffffffffffffffff RCX: 00000000021a32b0 RDX: 000000000007d011 RSI: 000000000007d000 RDI: 00000000021262b0 RBP: 00000000021262b0 R08: 0000000000000003 R09: 0000000000000086 R10: 00000000000000eb R11: 00007fefbdf2baa0 R12: 0000000000000000 R13: 0000000000000002 R14: 000000000007d000 R15: 0000000000001000 Signed-off-by: Liang Zhang <zhangliang5@huawei.com> Message-Id: <20220222031239.1076682-1-zhangliang5@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
marcan
pushed a commit
that referenced
this pull request
Dec 12, 2022
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to store a physical address. On a 64bit system, both are 64bit wide. However, on a 32bit system, the latter might be 64bit wide. This is, for example, the case on x86 with PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans 32bit. The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses that case, and assumes that the maximum PFN is limited by an 32bit phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong. Let's rely on the number of bits in phys_addr_t instead, but make sure to not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an unsigned long and the maximum swap offset shares that value with the swap type. For example, on an 8 GiB x86 PAE system with a kernel config based on Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently fail removing migration entries (remove_migration_ptes()), because mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as swp_offset_pfn() wrongly masks off PFN bits. For example, split_huge_page_to_list()->...->remap_page() will leave migration entries in place and continue to unlock the page. Later, when we stumble over these migration entries (e.g., via /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these migration entries shouldn't exist anymore and the page was unlocked. [ 33.067591] kernel BUG at include/linux/swapops.h:497! [ 33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16 [ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 33.067606] EIP: pagemap_pmd_range+0x644/0x650 [ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31 [ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000 [ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68 [ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246 [ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0 [ 33.067625] Call Trace: [ 33.067628] ? madvise_free_pte_range+0x720/0x720 [ 33.067632] ? smaps_pte_range+0x4b0/0x4b0 [ 33.067634] walk_pgd_range+0x325/0x720 [ 33.067637] ? mt_find+0x1d6/0x3a0 [ 33.067641] ? mt_find+0x1d6/0x3a0 [ 33.067643] __walk_page_range+0x164/0x170 [ 33.067646] walk_page_range+0xf9/0x170 [ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340 [ 33.067653] pagemap_read+0x124/0x280 [ 33.067658] ? default_llseek+0x101/0x160 [ 33.067662] ? smaps_account+0x1d0/0x1d0 [ 33.067664] vfs_read+0x90/0x290 [ 33.067667] ? do_madvise.part.0+0x24b/0x390 [ 33.067669] ? debug_smp_processor_id+0x12/0x20 [ 33.067673] ksys_pread64+0x58/0x90 [ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20 [ 33.067680] __do_fast_syscall_32+0x4c/0xc0 [ 33.067683] do_fast_syscall_32+0x29/0x60 [ 33.067686] do_SYSENTER_32+0x15/0x20 [ 33.067689] entry_SYSENTER_32+0x98/0xf1 Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it readable and consistent. [david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead] Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com [david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64] Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com Fixes: 0d206b5 ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
asahilina
pushed a commit
that referenced
this pull request
Feb 15, 2023
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following crash: [ 32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170 [ 32.093115] Oops [#1] [ 32.093251] Modules linked in: [ 32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16 [ 32.093985] Hardware name: riscv-virtio,qemu (DT) [ 32.094280] epc : ksys_read+0x88/0xd6 [ 32.094855] ra : ksys_read+0xc0/0xd6 [ 32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0 [ 32.095227] gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80 [ 32.095500] t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60 [ 32.095716] s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708 [ 32.095921] a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300 [ 32.096171] a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff [ 32.096411] s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170 [ 32.096638] s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030 [ 32.096865] s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410 [ 32.097092] s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c [ 32.097317] t5 : ffffffff8000c29c t6 : ffffffff800dbc54 [ 32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d [ 32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6 [ 32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38 [ 32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2 Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)', hence it requires obtaining the target address from rs1 followed by updating rd. Fixes: c22b0bc ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com [Palmer: Pick Guo's cleanup] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
37 tasks
marcan
pushed a commit
that referenced
this pull request
Mar 28, 2023
When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove") Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <mcornea@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
marcan
pushed a commit
that referenced
this pull request
Apr 24, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ #16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
WhatAmISupposedToPutHere
pushed a commit
to WhatAmISupposedToPutHere/linux
that referenced
this pull request
Apr 24, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
marcan
pushed a commit
that referenced
this pull request
Jul 14, 2023
Currently, the per cpu upcall counters are allocated after the vport is created and inserted into the system. This could lead to the datapath accessing the counters before they are allocated resulting in a kernel Oops. Here is an example: PID: 59693 TASK: ffff0005f4f51500 CPU: 0 COMMAND: "ovs-vswitchd" #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4 #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60 #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58 #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388 #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68 #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch] ... PID: 58682 TASK: ffff0005b2f0bf00 CPU: 0 COMMAND: "kworker/0:3" #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758 #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994 #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8 #3 [ffff80000a5d3120] die at ffffb70f0628234c #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8 #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4 #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4 #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710 #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74 #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24 #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch] #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch] #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch] #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch] #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch] #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90 We moved the per cpu upcall counter allocation to the existing vport alloc and free functions to solve this. Fixes: 95637d9 ("net: openvswitch: release vport resources on failure") Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
marcan
pushed a commit
that referenced
this pull request
Sep 20, 2023
Christoph Biedl reported early OOM on recent kernels: swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO), nodemask=(null) CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16 Hardware name: 9000/785/C3600 Backtrace: [<10408594>] show_stack+0x48/0x5c [<10e152d8>] dump_stack_lvl+0x48/0x64 [<10e15318>] dump_stack+0x24/0x34 [<105cf7f8>] warn_alloc+0x10c/0x1c8 [<105d068c>] __alloc_pages+0xbbc/0xcf8 [<105d0e4c>] __get_free_pages+0x28/0x78 [<105ad10c>] __pte_alloc_kernel+0x30/0x98 [<10406934>] set_fixmap+0xec/0xf4 [<10411ad4>] patch_map.constprop.0+0xa8/0xdc [<10411bb0>] __patch_text_multiple+0xa8/0x208 [<10411d78>] patch_text+0x30/0x48 [<1041246c>] arch_jump_label_transform+0x90/0xcc [<1056f734>] jump_label_update+0xd4/0x184 [<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110 [<1056fd08>] static_key_enable+0x1c/0x2c [<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8 [<1010141c>] start_kernel+0x5f0/0xa98 [<10105da8>] start_parisc+0xb8/0xe4 Mem-Info: active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 slab_reclaimable:0 slab_unreclaimable:0 mapped:0 shmem:0 pagetables:0 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:0 free_pcp:0 free_cma:0 Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB +writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB all_unreclaimable? no Normal free:0kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB +present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB 0 total pagecache pages 0 pages in swap cache Free swap = 0kB Total swap = 0kB 262144 pages RAM 0 pages HighMem/MovableOnly 2304 pages reserved Backtrace: [<10411d78>] patch_text+0x30/0x48 [<1041246c>] arch_jump_label_transform+0x90/0xcc [<1056f734>] jump_label_update+0xd4/0x184 [<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110 [<1056fd08>] static_key_enable+0x1c/0x2c [<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8 [<1010141c>] start_kernel+0x5f0/0xa98 [<10105da8>] start_parisc+0xb8/0xe4 Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0 CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16 Hardware name: 9000/785/C3600 This happens because patching static key code temporarily maps it via fixmap and if it happens before page allocator is initialized set_fixmap() cannot allocate memory using pte_alloc_kernel(). Make sure that fixmap page tables are preallocated early so that pte_offset_kernel() in set_fixmap() never resorts to pte allocation. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Tested-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v6.4+
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Oct 27, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Oct 29, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Oct 30, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Nov 8, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Nov 12, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Nov 12, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Nov 18, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Nov 20, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Nov 23, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
dberlin
pushed a commit
to dberlin/linux
that referenced
this pull request
Dec 1, 2023
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ AsahiLinux#16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
marcan
pushed a commit
that referenced
this pull request
Jan 19, 2024
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
jannau
pushed a commit
that referenced
this pull request
Apr 29, 2024
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Jun 4, 2024
…r_poll_disable during suspend When drivers call drm_kms_helper_poll_disable from their device suspend implementation without enabled output polling before, following warning will be reported,due to work->func not be initialized: [ 55.141361] WARNING: CPU: 3 PID: 372 at kernel/workqueue.c:3066 __flush_work+0x22f/0x240 [ 55.141382] Modules linked in: nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq intel_rapl_msr intel_rapl_common bochs drm_vram_helper drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3 aesni_intel drm_kms_helper joydev input_leds syscopyarea crypto_simd snd cryptd sysfillrect sysimgblt mac_hid serio_raw soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover virtio_blk xhci_pci_renesas failover [ 55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not tainted 6.2.0-rc6+ #16 [ 55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 55.141435] Workqueue: events_unbound async_run_entry_fn [ 55.141441] RIP: 0010:__flush_work+0x22f/0x240 [ 55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff ff 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff ff 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2 54 d8 00 66 90 90 90 90 90 90 [ 55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246 [ 55.141449] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff9b72bcbe [ 55.141450] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ff3ea01e4265e330 [ 55.141451] RBP: ff59221940833c90 R08: 0000000000000000 R09: 8080808080808080 [ 55.141453] R10: ff3ea01e42b3caf4 R11: 000000000000000f R12: ff3ea01e4265e330 [ 55.141454] R13: 0000000000000001 R14: ff3ea01e505e5e80 R15: 0000000000000001 [ 55.141455] FS: 0000000000000000(0000) GS:ff3ea01fb7cc0000(0000) knlGS:0000000000000000 [ 55.141456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.141458] CR2: 0000563543ad1546 CR3: 000000010ee82005 CR4: 0000000000771ee0 [ 55.141464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.141465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.141466] PKRU: 55555554 [ 55.141467] Call Trace: [ 55.141469] <TASK> [ 55.141472] ? pcie_wait_cmd+0xdf/0x220 [ 55.141478] ? mptcp_seq_show+0xe0/0x180 [ 55.141484] __cancel_work_timer+0x124/0x1b0 [ 55.141487] cancel_delayed_work_sync+0x17/0x20 [ 55.141490] drm_kms_helper_poll_disable+0x26/0x40 [drm_kms_helper] [ 55.141516] drm_mode_config_helper_suspend+0x25/0x90 [drm_kms_helper] [ 55.141531] ? __pm_runtime_resume+0x64/0x90 [ 55.141536] bochs_pm_suspend+0x16/0x20 [bochs] [ 55.141540] pci_pm_suspend+0x8b/0x1b0 [ 55.141545] ? __pfx_pci_pm_suspend+0x10/0x10 [ 55.141547] dpm_run_callback+0x4c/0x160 [ 55.141550] __device_suspend+0x14c/0x4c0 [ 55.141553] async_suspend+0x24/0xa0 [ 55.141555] async_run_entry_fn+0x34/0x120 [ 55.141557] process_one_work+0x21a/0x3f0 [ 55.141560] worker_thread+0x4e/0x3c0 [ 55.141563] ? __pfx_worker_thread+0x10/0x10 [ 55.141565] kthread+0xf2/0x120 [ 55.141568] ? __pfx_kthread+0x10/0x10 [ 55.141570] ret_from_fork+0x29/0x50 [ 55.141575] </TASK> [ 55.141575] ---[ end trace 0000000000000000 ]--- Fixes: a4e7717 ("drm/probe_helper: sort out poll_running vs poll_enabled") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
jannau
pushed a commit
that referenced
this pull request
Jun 15, 2024
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Jun 18, 2024
commit 9d274c1 upstream. We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jannau
pushed a commit
that referenced
this pull request
Jun 22, 2024
[ Upstream commit 79f18a4 ] When queues are started, netif_napi_add() and napi_enable() are called. If there are 4 queues and only 3 queues are used for the current configuration, only 3 queues' napi should be registered and enabled. The ionic_qcq_enable() checks whether the .poll pointer is not NULL for enabling only the using queue' napi. Unused queues' napi will not be registered by netif_napi_add(), so the .poll pointer indicates NULL. But it couldn't distinguish whether the napi was unregistered or not because netif_napi_del() doesn't reset the .poll pointer to NULL. So, ionic_qcq_enable() calls napi_enable() for the queue, which was unregistered by netif_napi_del(). Reproducer: ethtool -L <interface name> rx 1 tx 1 combined 0 ethtool -L <interface name> rx 0 tx 0 combined 1 ethtool -L <interface name> rx 0 tx 0 combined 4 Splat looks like: kernel BUG at net/core/dev.c:6666! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16 Workqueue: events ionic_lif_deferred_work [ionic] RIP: 0010:napi_enable+0x3b/0x40 Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28 RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20 FS: 0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? napi_enable+0x3b/0x40 ? do_error_trap+0x83/0xb0 ? napi_enable+0x3b/0x40 ? napi_enable+0x3b/0x40 ? exc_invalid_op+0x4e/0x70 ? napi_enable+0x3b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? napi_enable+0x3b/0x40 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] process_one_work+0x145/0x360 worker_thread+0x2bb/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 0f3154e ("ionic: Add Tx and Rx handling") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Jul 6, 2024
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jannau
pushed a commit
that referenced
this pull request
Jul 29, 2024
We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
jannau
pushed a commit
that referenced
this pull request
Jul 29, 2024
When queues are started, netif_napi_add() and napi_enable() are called. If there are 4 queues and only 3 queues are used for the current configuration, only 3 queues' napi should be registered and enabled. The ionic_qcq_enable() checks whether the .poll pointer is not NULL for enabling only the using queue' napi. Unused queues' napi will not be registered by netif_napi_add(), so the .poll pointer indicates NULL. But it couldn't distinguish whether the napi was unregistered or not because netif_napi_del() doesn't reset the .poll pointer to NULL. So, ionic_qcq_enable() calls napi_enable() for the queue, which was unregistered by netif_napi_del(). Reproducer: ethtool -L <interface name> rx 1 tx 1 combined 0 ethtool -L <interface name> rx 0 tx 0 combined 1 ethtool -L <interface name> rx 0 tx 0 combined 4 Splat looks like: kernel BUG at net/core/dev.c:6666! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16 Workqueue: events ionic_lif_deferred_work [ionic] RIP: 0010:napi_enable+0x3b/0x40 Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28 RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20 FS: 0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? napi_enable+0x3b/0x40 ? do_error_trap+0x83/0xb0 ? napi_enable+0x3b/0x40 ? napi_enable+0x3b/0x40 ? exc_invalid_op+0x4e/0x70 ? napi_enable+0x3b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? napi_enable+0x3b/0x40 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] process_one_work+0x145/0x360 worker_thread+0x2bb/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 0f3154e ("ionic: Add Tx and Rx handling") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Jul 29, 2024
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
jannau
pushed a commit
that referenced
this pull request
Sep 9, 2024
[ Upstream commit a699781 ] A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Sep 16, 2024
A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Oct 6, 2024
[ Upstream commit 89a906d ] Floating point instructions in userspace can crash some arm kernels built with clang/LLD 17.0.6: BUG: unsupported FP instruction in kernel mode FPEXC == 0xc0000780 Internal error: Oops - undefined instruction: 0 [#1] ARM CPU: 0 PID: 196 Comm: vfp-reproducer Not tainted 6.10.0 #1 Hardware name: BCM2835 PC is at vfp_support_entry+0xc8/0x2cc LR is at do_undefinstr+0xa8/0x250 pc : [<c0101d50>] lr : [<c010a80c>] psr: a0000013 sp : dc8d1f68 ip : 60000013 fp : bedea19c r10: ec532b17 r9 : 00000010 r8 : 0044766c r7 : c0000780 r6 : ec532b17 r5 : c1c13800 r4 : dc8d1fb0 r3 : c10072c4 r2 : c0101c88 r1 : ec532b17 r0 : 0044766c Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 00c5387d Table: 0251c008 DAC: 00000051 Register r0 information: non-paged memory Register r1 information: vmalloc memory Register r2 information: non-slab/vmalloc memory Register r3 information: non-slab/vmalloc memory Register r4 information: 2-page vmalloc region Register r5 information: slab kmalloc-cg-2k Register r6 information: vmalloc memory Register r7 information: non-slab/vmalloc memory Register r8 information: non-paged memory Register r9 information: zero-size pointer Register r10 information: vmalloc memory Register r11 information: non-paged memory Register r12 information: non-paged memory Process vfp-reproducer (pid: 196, stack limit = 0x61aaaf8b) Stack: (0xdc8d1f68 to 0xdc8d2000) 1f60: 0000081f b6f69300 0000000f c10073f4 c10072c4 dc8d1fb0 1f80: ec532b17 0c532b17 0044766c b6f9ccd8 00000000 c010a80c 00447670 60000010 1fa0: ffffffff c1c13800 00c5387d c0100f10 b6f68af8 00448fc0 00000000 bedea188 1fc0: bedea314 00000001 00448ebc b6f9d000 00447608 b6f9ccd8 00000000 bedea19c 1fe0: bede9198 bedea188 b6e1061c 0044766c 60000010 ffffffff 00000000 00000000 Call trace: [<c0101d50>] (vfp_support_entry) from [<c010a80c>] (do_undefinstr+0xa8/0x250) [<c010a80c>] (do_undefinstr) from [<c0100f10>] (__und_usr+0x70/0x80) Exception stack(0xdc8d1fb0 to 0xdc8d1ff8) 1fa0: b6f68af8 00448fc0 00000000 bedea188 1fc0: bedea314 00000001 00448ebc b6f9d000 00447608 b6f9ccd8 00000000 bedea19c 1fe0: bede9198 bedea188 b6e1061c 0044766c 60000010 ffffffff Code: 0a000061 e3877202 e594003c e3a0901 (eef16a10) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception in interrupt ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- This is a minimal userspace reproducer on a Raspberry Pi Zero W: #include <stdio.h> #include <math.h> int main(void) { double v = 1.0; printf("%fn", NAN + *(volatile double *)&v); return 0; } Another way to consistently trigger the oops is: calvin@raspberry-pi-zero-w ~$ python -c "import json" The bug reproduces only when the kernel is built with DYNAMIC_DEBUG=n, because the pr_debug() calls act as barriers even when not activated. This is the output from the same kernel source built with the same compiler and DYNAMIC_DEBUG=y, where the userspace reproducer works as expected: VFP: bounce: trigger ec532b17 fpexc c0000780 VFP: emulate: INST=0xee377b06 SCR=0x00000000 VFP: bounce: trigger eef1fa10 fpexc c0000780 VFP: emulate: INST=0xeeb40b40 SCR=0x00000000 VFP: raising exceptions 30000000 calvin@raspberry-pi-zero-w ~$ ./vfp-reproducer nan Crudely grepping for vmsr/vmrs instructions in the otherwise nearly idential text for vfp_support_entry() makes the problem obvious: vmlinux.llvm.good [0xc0101cb8] <+48>: vmrs r7, fpexc vmlinux.llvm.good [0xc0101cd8] <+80>: vmsr fpexc, r0 vmlinux.llvm.good [0xc0101d20] <+152>: vmsr fpexc, r7 vmlinux.llvm.good [0xc0101d38] <+176>: vmrs r4, fpexc vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs r0, fpscr vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr fpexc, r0 vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs r0, fpsid vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs r6, fpscr vmlinux.llvm.good [0xc0101e10] <+392>: vmrs r10, fpinst vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs r10, fpinst2 vmlinux.llvm.bad [0xc0101cb8] <+48>: vmrs r7, fpexc vmlinux.llvm.bad [0xc0101cd8] <+80>: vmsr fpexc, r0 vmlinux.llvm.bad [0xc0101d20] <+152>: vmsr fpexc, r7 vmlinux.llvm.bad [0xc0101d30] <+168>: vmrs r0, fpscr vmlinux.llvm.bad [0xc0101d50] <+200>: vmrs r6, fpscr <== BOOM! vmlinux.llvm.bad [0xc0101d6c] <+228>: vmsr fpexc, r0 vmlinux.llvm.bad [0xc0101d70] <+232>: vmrs r0, fpsid vmlinux.llvm.bad [0xc0101da4] <+284>: vmrs r10, fpinst vmlinux.llvm.bad [0xc0101df8] <+368>: vmrs r4, fpexc vmlinux.llvm.bad [0xc0101e5c] <+468>: vmrs r10, fpinst2 I think LLVM's reordering is valid as the code is currently written: the compiler doesn't know the instructions have side effects in hardware. Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be reordered with respect to each other. The original compiler now produces working kernels on my hardware with DYNAMIC_DEBUG=n. This is the relevant piece of the diff of the vfp_support_entry() text, from the original oopsing kernel to a working kernel with this patch: vmrs r0, fpscr tst r0, #4096 bne 0xc0101d48 tst r0, #458752 beq 0xc0101ecc orr r7, r7, #536870912 ldr r0, [r4, #0x3c] mov r9, #16 -vmrs r6, fpscr orr r9, r9, #251658240 add r0, r0, #4 str r0, [r4, #0x3c] mvn r0, #159 sub r0, r0, #-1207959552 and r0, r7, r0 vmsr fpexc, r0 vmrs r0, fpsid +vmrs r6, fpscr and r0, r0, #983040 cmp r0, #65536 bne 0xc0101d88 Fixes: 4708fb0 ("ARM: vfp: Reimplement VFP exception entry in C code") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
WhatAmISupposedToPutHere
pushed a commit
to WhatAmISupposedToPutHere/linux
that referenced
this pull request
Oct 30, 2024
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 AsahiLinux#5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 AsahiLinux#6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 AsahiLinux#7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 AsahiLinux#8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 AsahiLinux#9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 AsahiLinux#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 AsahiLinux#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 AsahiLinux#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 AsahiLinux#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 AsahiLinux#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 AsahiLinux#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 AsahiLinux#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 AsahiLinux#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jannau
pushed a commit
that referenced
this pull request
Nov 2, 2024
commit d0e806b upstream. During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty #18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, #16] 6118: a9025bf5 stp x21, x22, [sp, #32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, #48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, #64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, #80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, torvalds#776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, #32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: stable@vger.kernel.org Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Tested-by: Steev Klimaszewski <steev@kali.org> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://patch.msgid.link/20241009213922.999355-1-alexey.klimov@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jannau
pushed a commit
that referenced
this pull request
Mar 1, 2025
[ Upstream commit 6d00234 ] Function xen_pin_page calls xen_pte_lock, which in turn grab page table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock to be held before grabbing ptlock, but this does not happen when pinning is caused by xen_mm_pin_all. This commit addresses lockdep warning below, which shows up when suspending a Xen VM. [ 3680.658422] Freezing user space processes [ 3680.660156] Freezing user space processes completed (elapsed 0.001 seconds) [ 3680.660182] OOM killer disabled. [ 3680.660192] Freezing remaining freezable tasks [ 3680.661485] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 3680.685254] [ 3680.685265] ================================== [ 3680.685269] WARNING: Nested lock was not taken [ 3680.685274] 6.12.0+ #16 Tainted: G W [ 3680.685279] ---------------------------------- [ 3680.685283] migration/0/19 is trying to lock: [ 3680.685288] ffff88800bac33c0 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: xen_pin_page+0x175/0x1d0 [ 3680.685303] [ 3680.685303] but this task is not holding: [ 3680.685308] init_mm.page_table_lock [ 3680.685311] [ 3680.685311] stack backtrace: [ 3680.685316] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G W 6.12.0+ #16 [ 3680.685324] Tainted: [W]=WARN [ 3680.685328] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0 [ 3680.685339] Call Trace: [ 3680.685344] <TASK> [ 3680.685347] dump_stack_lvl+0x77/0xb0 [ 3680.685356] __lock_acquire+0x917/0x2310 [ 3680.685364] lock_acquire+0xce/0x2c0 [ 3680.685369] ? xen_pin_page+0x175/0x1d0 [ 3680.685373] _raw_spin_lock_nest_lock+0x2f/0x70 [ 3680.685381] ? xen_pin_page+0x175/0x1d0 [ 3680.685386] xen_pin_page+0x175/0x1d0 [ 3680.685390] ? __pfx_xen_pin_page+0x10/0x10 [ 3680.685394] __xen_pgd_walk+0x233/0x2c0 [ 3680.685401] ? stop_one_cpu+0x91/0x100 [ 3680.685405] __xen_pgd_pin+0x5d/0x250 [ 3680.685410] xen_mm_pin_all+0x70/0xa0 [ 3680.685415] xen_pv_pre_suspend+0xf/0x280 [ 3680.685420] xen_suspend+0x57/0x1a0 [ 3680.685428] multi_cpu_stop+0x6b/0x120 [ 3680.685432] ? update_cpumasks_hier+0x7c/0xa60 [ 3680.685439] ? __pfx_multi_cpu_stop+0x10/0x10 [ 3680.685443] cpu_stopper_thread+0x8c/0x140 [ 3680.685448] ? smpboot_thread_fn+0x20/0x1f0 [ 3680.685454] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 3680.685458] smpboot_thread_fn+0xed/0x1f0 [ 3680.685462] kthread+0xde/0x110 [ 3680.685467] ? __pfx_kthread+0x10/0x10 [ 3680.685471] ret_from_fork+0x2f/0x50 [ 3680.685478] ? __pfx_kthread+0x10/0x10 [ 3680.685482] ret_from_fork_asm+0x1a/0x30 [ 3680.685489] </TASK> [ 3680.685491] [ 3680.685491] other info that might help us debug this: [ 3680.685497] 1 lock held by migration/0/19: [ 3680.685500] #0: ffffffff8284df38 (pgd_lock){+.+.}-{3:3}, at: xen_mm_pin_all+0x14/0xa0 [ 3680.685512] [ 3680.685512] stack backtrace: [ 3680.685518] CPU: 0 UID: 0 PID: 19 Comm: migration/0 Tainted: G W 6.12.0+ #16 [ 3680.685528] Tainted: [W]=WARN [ 3680.685531] Stopper: multi_cpu_stop+0x0/0x120 <- __stop_cpus.constprop.0+0x8c/0xd0 [ 3680.685538] Call Trace: [ 3680.685541] <TASK> [ 3680.685544] dump_stack_lvl+0x77/0xb0 [ 3680.685549] __lock_acquire+0x93c/0x2310 [ 3680.685554] lock_acquire+0xce/0x2c0 [ 3680.685558] ? xen_pin_page+0x175/0x1d0 [ 3680.685562] _raw_spin_lock_nest_lock+0x2f/0x70 [ 3680.685568] ? xen_pin_page+0x175/0x1d0 [ 3680.685572] xen_pin_page+0x175/0x1d0 [ 3680.685578] ? __pfx_xen_pin_page+0x10/0x10 [ 3680.685582] __xen_pgd_walk+0x233/0x2c0 [ 3680.685588] ? stop_one_cpu+0x91/0x100 [ 3680.685592] __xen_pgd_pin+0x5d/0x250 [ 3680.685596] xen_mm_pin_all+0x70/0xa0 [ 3680.685600] xen_pv_pre_suspend+0xf/0x280 [ 3680.685607] xen_suspend+0x57/0x1a0 [ 3680.685611] multi_cpu_stop+0x6b/0x120 [ 3680.685615] ? update_cpumasks_hier+0x7c/0xa60 [ 3680.685620] ? __pfx_multi_cpu_stop+0x10/0x10 [ 3680.685625] cpu_stopper_thread+0x8c/0x140 [ 3680.685629] ? smpboot_thread_fn+0x20/0x1f0 [ 3680.685634] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 3680.685638] smpboot_thread_fn+0xed/0x1f0 [ 3680.685642] kthread+0xde/0x110 [ 3680.685645] ? __pfx_kthread+0x10/0x10 [ 3680.685649] ret_from_fork+0x2f/0x50 [ 3680.685654] ? __pfx_kthread+0x10/0x10 [ 3680.685657] ret_from_fork_asm+0x1a/0x30 [ 3680.685662] </TASK> [ 3680.685267] xen:grant_table: Grant tables using version 1 layout [ 3680.685921] OOM killer enabled. [ 3680.685934] Restarting tasks ... done. Signed-off-by: Maksym Planeta <maksym@exostellar.io> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20241204103516.3309112-1-maksym@exostellar.io> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
jannau
pushed a commit
that referenced
this pull request
Apr 21, 2025
commit ddc210c upstream. Fixed warning on PM resume as shown below caused due to uninitialized struct nand_operation that checks chip select field : WARN_ON(op->cs >= nanddev_ntargets(&chip->base) [ 14.588522] ------------[ cut here ]------------ [ 14.588529] WARNING: CPU: 0 PID: 1392 at drivers/mtd/nand/raw/internals.h:139 nand_reset_op+0x1e0/0x1f8 [ 14.588553] Modules linked in: bdc udc_core [ 14.588579] CPU: 0 UID: 0 PID: 1392 Comm: rtcwake Tainted: G W 6.14.0-rc4-g5394eea10651 #16 [ 14.588590] Tainted: [W]=WARN [ 14.588593] Hardware name: Broadcom STB (Flattened Device Tree) [ 14.588598] Call trace: [ 14.588604] dump_backtrace from show_stack+0x18/0x1c [ 14.588622] r7:00000009 r6:0000008b r5:60000153 r4:c0fa558c [ 14.588625] show_stack from dump_stack_lvl+0x70/0x7c [ 14.588639] dump_stack_lvl from dump_stack+0x18/0x1c [ 14.588653] r5:c08d40b0 r4:c1003cb0 [ 14.588656] dump_stack from __warn+0x84/0xe4 [ 14.588668] __warn from warn_slowpath_fmt+0x18c/0x194 [ 14.588678] r7:c08d40b0 r6:c1003cb0 r5:00000000 r4:00000000 [ 14.588681] warn_slowpath_fmt from nand_reset_op+0x1e0/0x1f8 [ 14.588695] r8:70c40dff r7:89705f41 r6:36b4a597 r5:c26c9444 r4:c26b0048 [ 14.588697] nand_reset_op from brcmnand_resume+0x13c/0x150 [ 14.588714] r9:00000000 r8:00000000 r7:c24f8010 r6:c228a3f8 r5:c26c94bc r4:c26b0040 [ 14.588717] brcmnand_resume from platform_pm_resume+0x34/0x54 [ 14.588735] r5:00000010 r4:c0840a50 [ 14.588738] platform_pm_resume from dpm_run_callback+0x5c/0x14c [ 14.588757] dpm_run_callback from device_resume+0xc0/0x324 [ 14.588776] r9:c24f8054 r8:c24f80a0 r7:00000000 r6:00000000 r5:00000010 r4:c24f8010 [ 14.588779] device_resume from dpm_resume+0x130/0x160 [ 14.588799] r9:c22539e4 r8:00000010 r7:c22bebb0 r6:c24f8010 r5:c22539dc r4:c22539b0 [ 14.588802] dpm_resume from dpm_resume_end+0x14/0x20 [ 14.588822] r10:c2204e40 r9:00000000 r8:c228a3fc r7:00000000 r6:00000003 r5:c228a414 [ 14.588826] r4:00000010 [ 14.588828] dpm_resume_end from suspend_devices_and_enter+0x274/0x6f8 [ 14.588848] r5:c228a414 r4:00000000 [ 14.588851] suspend_devices_and_enter from pm_suspend+0x228/0x2bc [ 14.588868] r10:c3502910 r9:c3501f40 r8:00000004 r7:c228a438 r6:c0f95e18 r5:00000000 [ 14.588871] r4:00000003 [ 14.588874] pm_suspend from state_store+0x74/0xd0 [ 14.588889] r7:c228a438 r6:c0f934c8 r5:00000003 r4:00000003 [ 14.588892] state_store from kobj_attr_store+0x1c/0x28 [ 14.588913] r9:00000000 r8:00000000 r7:f09f9f08 r6:00000004 r5:c3502900 r4:c0283250 [ 14.588916] kobj_attr_store from sysfs_kf_write+0x40/0x4c [ 14.588936] r5:c3502900 r4:c0d92a48 [ 14.588939] sysfs_kf_write from kernfs_fop_write_iter+0x104/0x1f0 [ 14.588956] r5:c3502900 r4:c3501f40 [ 14.588960] kernfs_fop_write_iter from vfs_write+0x250/0x420 [ 14.588980] r10:c0e14b48 r9:00000000 r8:c25f5780 r7:00443398 r6:f09f9f68 r5:c34f7f00 [ 14.588983] r4:c042a88c [ 14.588987] vfs_write from ksys_write+0x74/0xe4 [ 14.589005] r10:00000004 r9:c25f5780 r8:c02002fA0 r7:00000000 r6:00000000 r5:c34f7f00 [ 14.589008] r4:c34f7f00 [ 14.589011] ksys_write from sys_write+0x10/0x14 [ 14.589029] r7:00000004 r6:004421c0 r5:00443398 r4:00000004 [ 14.589032] sys_write from ret_fast_syscall+0x0/0x5c [ 14.589044] Exception stack(0xf09f9fa8 to 0xf09f9ff0) [ 14.589050] 9fa0: 00000004 00443398 00000004 00443398 00000004 00000001 [ 14.589056] 9fc0: 00000004 00443398 004421c0 00000004 b6ecbd58 00000008 bebfbc38 0043eb78 [ 14.589062] 9fe0: 00440eb0 bebfbaf8 b6de18a0 b6e579e8 [ 14.589065] ---[ end trace 0000000000000000 ]--- The fix uses the higher level nand_reset(chip, chipnr); where chipnr = 0, when doing PM resume operation in compliance with the controller support for single die nand chip. Switching from nand_reset_op() to nand_reset() implies more than just setting the cs field op->cs, it also reconfigures the data interface (ie. the timings). Tested and confirmed the NAND chip is in sync timing wise with host after the fix. Fixes: 97d90da ("mtd: nand: provide several helpers to do common NAND operations") Cc: stable@vger.kernel.org Signed-off-by: Kamal Dasu <kamal.dasu@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
asdfugil
pushed a commit
to HoolockLinux/linux
that referenced
this pull request
Apr 29, 2025
There is a potential deadlock if we do report zones in an IO context, detailed in below lockdep report. When one process do a report zones and another process freezes the block device, the report zones side cannot allocate a tag because the freeze is already started. This can thus result in new block group creation to hang forever, blocking the write path. Thankfully, a new block group should be created on empty zones. So, reporting the zones is not necessary and we can set the write pointer = 0 and load the zone capacity from the block layer using bdev_zone_capacity() helper. ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc1 AsahiLinux#252 Not tainted ------------------------------------------------------ modprobe/1110 is trying to acquire lock: ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60 but task is already holding lock: ffff8881205b6f20 (&q->q_usage_counter(queue)AsahiLinux#16){++++}-{0:0}, at: sd_remove+0x85/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> AsahiLinux#3 (&q->q_usage_counter(queue)AsahiLinux#16){++++}-{0:0}: blk_queue_enter+0x3d9/0x500 blk_mq_alloc_request+0x47d/0x8e0 scsi_execute_cmd+0x14f/0xb80 sd_zbc_do_report_zones+0x1c1/0x470 sd_zbc_report_zones+0x362/0xd60 blkdev_report_zones+0x1b1/0x2e0 btrfs_get_dev_zones+0x215/0x7e0 [btrfs] btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs] btrfs_make_block_group+0x36b/0x870 [btrfs] btrfs_create_chunk+0x147d/0x2320 [btrfs] btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs] start_transaction+0xce6/0x1620 [btrfs] btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs] kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> AsahiLinux#2 (&fs_info->dev_replace.rwsem){++++}-{4:4}: down_read+0x9b/0x470 btrfs_map_block+0x2ce/0x2ce0 [btrfs] btrfs_submit_chunk+0x2d4/0x16c0 [btrfs] btrfs_submit_bbio+0x16/0x30 [btrfs] btree_write_cache_pages+0xb5a/0xf90 [btrfs] do_writepages+0x17f/0x7b0 __writeback_single_inode+0x114/0xb00 writeback_sb_inodes+0x52b/0xe00 wb_writeback+0x1a7/0x800 wb_workfn+0x12a/0xbd0 process_one_work+0x85a/0x1460 worker_thread+0x5e2/0xfc0 kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> AsahiLinux#1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}: __mutex_lock+0x1aa/0x1360 btree_write_cache_pages+0x252/0xf90 [btrfs] do_writepages+0x17f/0x7b0 __writeback_single_inode+0x114/0xb00 writeback_sb_inodes+0x52b/0xe00 wb_writeback+0x1a7/0x800 wb_workfn+0x12a/0xbd0 process_one_work+0x85a/0x1460 worker_thread+0x5e2/0xfc0 kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __lock_acquire+0x2f52/0x5ea0 lock_acquire+0x1b1/0x540 __flush_work+0x3ac/0xb60 wb_shutdown+0x15b/0x1f0 bdi_unregister+0x172/0x5b0 del_gendisk+0x841/0xa20 sd_remove+0x85/0x130 device_release_driver_internal+0x368/0x520 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 __scsi_remove_device+0x272/0x340 scsi_forget_host+0xf7/0x170 scsi_remove_host+0xd2/0x2a0 sdebug_driver_remove+0x52/0x2f0 [scsi_debug] device_release_driver_internal+0x368/0x520 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 device_unregister+0x13/0xa0 sdebug_do_remove_host+0x1fb/0x290 [scsi_debug] scsi_debug_exit+0x17/0x70 [scsi_debug] __do_sys_delete_module.isra.0+0x321/0x520 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: (work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)AsahiLinux#16 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->q_usage_counter(queue)AsahiLinux#16); lock(&fs_info->dev_replace.rwsem); lock(&q->q_usage_counter(queue)AsahiLinux#16); lock((work_completion)(&(&wb->dwork)->work)); *** DEADLOCK *** 5 locks held by modprobe/1110: #0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520 AsahiLinux#1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0 AsahiLinux#2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520 AsahiLinux#3: ffff8881205b6f20 (&q->q_usage_counter(queue)AsahiLinux#16){++++}-{0:0}, at: sd_remove+0x85/0x130 AsahiLinux#4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60 stack backtrace: CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 AsahiLinux#252 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x90 print_circular_bug.cold+0x1e0/0x274 check_noncircular+0x306/0x3f0 ? __pfx_check_noncircular+0x10/0x10 ? mark_lock+0xf5/0x1650 ? __pfx_check_irq_usage+0x10/0x10 ? lockdep_lock+0xca/0x1c0 ? __pfx_lockdep_lock+0x10/0x10 __lock_acquire+0x2f52/0x5ea0 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_mark_lock+0x10/0x10 lock_acquire+0x1b1/0x540 ? __flush_work+0x38f/0xb60 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x94/0xe0 ? __flush_work+0x38f/0xb60 __flush_work+0x3ac/0xb60 ? __flush_work+0x38f/0xb60 ? __pfx_mark_lock+0x10/0x10 ? __pfx___flush_work+0x10/0x10 ? __pfx_wq_barrier_func+0x10/0x10 ? __pfx___might_resched+0x10/0x10 ? mark_held_locks+0x94/0xe0 wb_shutdown+0x15b/0x1f0 bdi_unregister+0x172/0x5b0 ? __pfx_bdi_unregister+0x10/0x10 ? up_write+0x1ba/0x510 del_gendisk+0x841/0xa20 ? __pfx_del_gendisk+0x10/0x10 ? _raw_spin_unlock_irqrestore+0x35/0x60 ? __pm_runtime_resume+0x79/0x110 sd_remove+0x85/0x130 device_release_driver_internal+0x368/0x520 ? kobject_put+0x5d/0x4a0 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 ? __pfx_device_del+0x10/0x10 __scsi_remove_device+0x272/0x340 scsi_forget_host+0xf7/0x170 scsi_remove_host+0xd2/0x2a0 sdebug_driver_remove+0x52/0x2f0 [scsi_debug] ? kernfs_remove_by_name_ns+0xc0/0xf0 device_release_driver_internal+0x368/0x520 ? kobject_put+0x5d/0x4a0 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 ? __pfx_device_del+0x10/0x10 ? __pfx___mutex_unlock_slowpath+0x10/0x10 device_unregister+0x13/0xa0 sdebug_do_remove_host+0x1fb/0x290 [scsi_debug] scsi_debug_exit+0x17/0x70 [scsi_debug] __do_sys_delete_module.isra.0+0x321/0x520 ? __pfx___do_sys_delete_module.isra.0+0x10/0x10 ? __pfx_slab_free_after_rcu_debug+0x10/0x10 ? kasan_save_stack+0x2c/0x50 ? kasan_record_aux_stack+0xa3/0xb0 ? __call_rcu_common.constprop.0+0xc4/0xfb0 ? kmem_cache_free+0x3a0/0x590 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x93/0x180 ? lock_is_held_type+0xd5/0x130 ? __call_rcu_common.constprop.0+0x3c0/0xfb0 ? lockdep_hardirqs_on+0x78/0x100 ? __call_rcu_common.constprop.0+0x3c0/0xfb0 ? __pfx___call_rcu_common.constprop.0+0x10/0x10 ? kmem_cache_free+0x3a0/0x590 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? __pfx___x64_sys_openat+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f436712b68b RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8 RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000 R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000 </TASK> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> CC: <stable@vger.kernel.org> # 6.13+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
jannau
pushed a commit
that referenced
this pull request
May 9, 2025
[ Upstream commit 866bafa ] There is a potential deadlock if we do report zones in an IO context, detailed in below lockdep report. When one process do a report zones and another process freezes the block device, the report zones side cannot allocate a tag because the freeze is already started. This can thus result in new block group creation to hang forever, blocking the write path. Thankfully, a new block group should be created on empty zones. So, reporting the zones is not necessary and we can set the write pointer = 0 and load the zone capacity from the block layer using bdev_zone_capacity() helper. ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc1 #252 Not tainted ------------------------------------------------------ modprobe/1110 is trying to acquire lock: ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60 but task is already holding lock: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}: blk_queue_enter+0x3d9/0x500 blk_mq_alloc_request+0x47d/0x8e0 scsi_execute_cmd+0x14f/0xb80 sd_zbc_do_report_zones+0x1c1/0x470 sd_zbc_report_zones+0x362/0xd60 blkdev_report_zones+0x1b1/0x2e0 btrfs_get_dev_zones+0x215/0x7e0 [btrfs] btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs] btrfs_make_block_group+0x36b/0x870 [btrfs] btrfs_create_chunk+0x147d/0x2320 [btrfs] btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs] start_transaction+0xce6/0x1620 [btrfs] btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs] kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}: down_read+0x9b/0x470 btrfs_map_block+0x2ce/0x2ce0 [btrfs] btrfs_submit_chunk+0x2d4/0x16c0 [btrfs] btrfs_submit_bbio+0x16/0x30 [btrfs] btree_write_cache_pages+0xb5a/0xf90 [btrfs] do_writepages+0x17f/0x7b0 __writeback_single_inode+0x114/0xb00 writeback_sb_inodes+0x52b/0xe00 wb_writeback+0x1a7/0x800 wb_workfn+0x12a/0xbd0 process_one_work+0x85a/0x1460 worker_thread+0x5e2/0xfc0 kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}: __mutex_lock+0x1aa/0x1360 btree_write_cache_pages+0x252/0xf90 [btrfs] do_writepages+0x17f/0x7b0 __writeback_single_inode+0x114/0xb00 writeback_sb_inodes+0x52b/0xe00 wb_writeback+0x1a7/0x800 wb_workfn+0x12a/0xbd0 process_one_work+0x85a/0x1460 worker_thread+0x5e2/0xfc0 kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __lock_acquire+0x2f52/0x5ea0 lock_acquire+0x1b1/0x540 __flush_work+0x3ac/0xb60 wb_shutdown+0x15b/0x1f0 bdi_unregister+0x172/0x5b0 del_gendisk+0x841/0xa20 sd_remove+0x85/0x130 device_release_driver_internal+0x368/0x520 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 __scsi_remove_device+0x272/0x340 scsi_forget_host+0xf7/0x170 scsi_remove_host+0xd2/0x2a0 sdebug_driver_remove+0x52/0x2f0 [scsi_debug] device_release_driver_internal+0x368/0x520 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 device_unregister+0x13/0xa0 sdebug_do_remove_host+0x1fb/0x290 [scsi_debug] scsi_debug_exit+0x17/0x70 [scsi_debug] __do_sys_delete_module.isra.0+0x321/0x520 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: (work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)#16 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->q_usage_counter(queue)#16); lock(&fs_info->dev_replace.rwsem); lock(&q->q_usage_counter(queue)#16); lock((work_completion)(&(&wb->dwork)->work)); *** DEADLOCK *** 5 locks held by modprobe/1110: #0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520 #1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0 #2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520 #3: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130 #4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60 stack backtrace: CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 #252 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x90 print_circular_bug.cold+0x1e0/0x274 check_noncircular+0x306/0x3f0 ? __pfx_check_noncircular+0x10/0x10 ? mark_lock+0xf5/0x1650 ? __pfx_check_irq_usage+0x10/0x10 ? lockdep_lock+0xca/0x1c0 ? __pfx_lockdep_lock+0x10/0x10 __lock_acquire+0x2f52/0x5ea0 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_mark_lock+0x10/0x10 lock_acquire+0x1b1/0x540 ? __flush_work+0x38f/0xb60 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x94/0xe0 ? __flush_work+0x38f/0xb60 __flush_work+0x3ac/0xb60 ? __flush_work+0x38f/0xb60 ? __pfx_mark_lock+0x10/0x10 ? __pfx___flush_work+0x10/0x10 ? __pfx_wq_barrier_func+0x10/0x10 ? __pfx___might_resched+0x10/0x10 ? mark_held_locks+0x94/0xe0 wb_shutdown+0x15b/0x1f0 bdi_unregister+0x172/0x5b0 ? __pfx_bdi_unregister+0x10/0x10 ? up_write+0x1ba/0x510 del_gendisk+0x841/0xa20 ? __pfx_del_gendisk+0x10/0x10 ? _raw_spin_unlock_irqrestore+0x35/0x60 ? __pm_runtime_resume+0x79/0x110 sd_remove+0x85/0x130 device_release_driver_internal+0x368/0x520 ? kobject_put+0x5d/0x4a0 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 ? __pfx_device_del+0x10/0x10 __scsi_remove_device+0x272/0x340 scsi_forget_host+0xf7/0x170 scsi_remove_host+0xd2/0x2a0 sdebug_driver_remove+0x52/0x2f0 [scsi_debug] ? kernfs_remove_by_name_ns+0xc0/0xf0 device_release_driver_internal+0x368/0x520 ? kobject_put+0x5d/0x4a0 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 ? __pfx_device_del+0x10/0x10 ? __pfx___mutex_unlock_slowpath+0x10/0x10 device_unregister+0x13/0xa0 sdebug_do_remove_host+0x1fb/0x290 [scsi_debug] scsi_debug_exit+0x17/0x70 [scsi_debug] __do_sys_delete_module.isra.0+0x321/0x520 ? __pfx___do_sys_delete_module.isra.0+0x10/0x10 ? __pfx_slab_free_after_rcu_debug+0x10/0x10 ? kasan_save_stack+0x2c/0x50 ? kasan_record_aux_stack+0xa3/0xb0 ? __call_rcu_common.constprop.0+0xc4/0xfb0 ? kmem_cache_free+0x3a0/0x590 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x93/0x180 ? lock_is_held_type+0xd5/0x130 ? __call_rcu_common.constprop.0+0x3c0/0xfb0 ? lockdep_hardirqs_on+0x78/0x100 ? __call_rcu_common.constprop.0+0x3c0/0xfb0 ? __pfx___call_rcu_common.constprop.0+0x10/0x10 ? kmem_cache_free+0x3a0/0x590 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? __pfx___x64_sys_openat+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f436712b68b RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8 RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000 R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000 </TASK> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> CC: <stable@vger.kernel.org> # 6.13+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
asdfugil
pushed a commit
to HoolockLinux/linux
that referenced
this pull request
Jun 9, 2025
syzkaller has found another ugly race in the VGIC, this time dealing with VGIC creation. Since kvm_vgic_create() doesn't sufficiently protect against in-flight vCPU creations, it is possible to get a vCPU into the kernel w/ an in-kernel VGIC but no allocation of private IRQs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000d20 Mem abort info: ESR = 0x0000000096000046 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000103e4f000 [0000000000000d20] pgd=0800000102e1c403, p4d=0800000102e1c403, pud=0800000101146403, pmd=0000000000000000 Internal error: Oops: 0000000096000046 [AsahiLinux#1] PREEMPT SMP CPU: 9 UID: 0 PID: 246 Comm: test Not tainted 6.14.0-rc6-00097-g0c90821f5db8 AsahiLinux#16 Hardware name: linux,dummy-virt (DT) pstate: 814020c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : _raw_spin_lock_irqsave+0x34/0x8c lr : kvm_vgic_set_owner+0x54/0xa4 sp : ffff80008086ba20 x29: ffff80008086ba20 x28: ffff0000c19b5640 x27: 0000000000000000 x26: 0000000000000000 x25: ffff0000c4879bd0 x24: 000000000000001e x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c487af80 x20: ffff0000c487af18 x19: 0000000000000000 x18: 0000001afadd5a8b x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001 x14: ffff0000c19b56c0 x13: 0030c9adf9d9889e x12: ffffc263710e1908 x11: 0000001afb0d74f2 x10: e0966b840b373664 x9 : ec806bf7d6a57cd5 x8 : ffff80008086b980 x7 : 0000000000000001 x6 : 0000000000000001 x5 : 0000000080800054 x4 : 4ec4ec4ec4ec4ec5 x3 : 0000000000000000 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000d20 Call trace: _raw_spin_lock_irqsave+0x34/0x8c (P) kvm_vgic_set_owner+0x54/0xa4 kvm_timer_enable+0xf4/0x274 kvm_arch_vcpu_run_pid_change+0xe0/0x380 kvm_vcpu_ioctl+0x93c/0x9e0 __arm64_sys_ioctl+0xb4/0xec invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x198/0x19c Code: b9000841 d503201f 52800001 52800022 (88e17c02) ---[ end trace 0000000000000000 ]--- Plug the race by explicitly checking for an in-progress vCPU creation and failing kvm_vgic_create() when that's the case. Add some comments to document all the things kvm_vgic_create() is trying to guard against too. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250523194722.4066715-6-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replace line 45 'secify' with 'specify' in both files.
Signed-off-by: steven1lung 1030steven@gmail.com