forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers/perf: apple_m1: Add mapping for branch counters #307
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ Upstream commit b98d042 ] This helper belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Stable-dep-of: 285f2a0 ("cxl/region: Avoid null pointer dereference in region lookup") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 285f2a0 ] cxl_dpa_to_region() looks up a region based on a memdev and DPA. It wrongly assumes an endpoint found mapping the DPA is also of a fully assembled region. When not true it leads to a null pointer dereference looking up the region name. This appears during testing of region lookup after a failure to assemble a BIOS defined region or if the lookup raced with the assembly of the BIOS defined region. Failure to clean up BIOS defined regions that fail assembly is an issue in itself and a fix to that problem will alleviate some of the impact. It will not alleviate the race condition so let's harden this path. The behavior change is that the kernel oops due to a null pointer dereference is replaced with a dev_dbg() message noting that an endpoint was mapped. Additional comments are added so that future users of this function can more clearly understand what it provides. Fixes: 0a105ab ("cxl/memdev: Warn of poison inject or clear to a mapped region") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240604003609.202682-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84328c5 ] Since interleave capability is not verified, if the interleave capability of a target does not match the region need, committing decoder should have failed at the device end. In order to checkout this error as quickly as possible, driver needs to check the interleave capability of target during attaching it to region. Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register), bits 11 and 12 indicate the capability to establish interleaving in 3, 6, 12 and 16 ways. If these bits are not set, the target cannot be attached to a region utilizing such interleave ways. Additionally, bits 8 and 9 represent the capability of the bits used for interleaving in the address, Linux tracks this in the cxl_port interleave_mask. Per CXL specification r3.1(8.2.4.20.13 Decoder Protection): eIW means encoded Interleave Ways. eIG means encoded Interleave Granularity. in HPA: if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used, the interleave bits are none, the following check is ignored. if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits start at bit position eIG + 8 and end at eIG + eIW + 8 - 1. if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits start at bit position eIG + 8 and end at eIG + eIW - 1. if the interleave mask is insufficient to cover the required interleave bits, the target cannot be attached to the region. Fixes: 384e624 ("cxl/region: Attach endpoint decoders") Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a81c98b ] Fix netfs_page_mkwrite() to check that folio->mapping is valid once it has taken the folio lock (as filemap_page_mkwrite() does). Without this, generic/247 occasionally oopses with something like the following: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:trace_event_raw_event_netfs_folio+0x61/0xc0 ... Call Trace: <TASK> ? __die_body+0x1a/0x60 ? page_fault_oops+0x6e/0xa0 ? exc_page_fault+0xc2/0xe0 ? asm_exc_page_fault+0x22/0x30 ? trace_event_raw_event_netfs_folio+0x61/0xc0 trace_netfs_folio+0x39/0x40 netfs_page_mkwrite+0x14c/0x1d0 do_page_mkwrite+0x50/0x90 do_pte_missing+0x184/0x200 __handle_mm_fault+0x42d/0x500 handle_mm_fault+0x121/0x1f0 do_user_addr_fault+0x23e/0x3c0 exc_page_fault+0xc2/0xe0 asm_exc_page_fault+0x22/0x30 This is due to the invalidate_inode_pages2_range() issued at the end of the DIO write interfering with the mmap'd writes. Fixes: 102a7e2 ("netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/780211.1719318546@warthog.procyon.org.uk Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d66154 ] Fix netfs_page_mkwrite() to use filemap_fdatawrite_range(), not filemap_fdatawait_range() to flush conflicting data. Fixes: 102a7e2 ("netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/614300.1719228243@warthog.procyon.org.uk cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9706fc8 upstream. With commit a81dbd0 ("serial: imx: set receiver level before starting uart") we set the receiver level to its default value. This caused a regression when using SDMA, where the receiver level is 9 instead of 8 (default). This change will first check if the receiver level is zero and only then set it to the default. This still avoids the interrupt storm when the receiver level is zero. Fixes: a81dbd0 ("serial: imx: set receiver level before starting uart") Cc: stable <stable@kernel.org> Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Link: https://lore.kernel.org/r/20240703112543.148304-1-eichest@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c128a1b upstream. Errata i2310[0] says, Erroneous timeout can be triggered, if this Erroneous interrupt is not cleared then it may leads to storm of interrupts. Commit 9d141c1 ("serial: 8250_omap: Implementation of Errata i2310") which added the workaround but missed ensuring RX FIFO is really empty before applying the errata workaround as recommended in the errata text. Fix this by adding back check for UART_OMAP_RX_LVL to be 0 for workaround to take effect. [0] https://www.ti.com/lit/pdf/sprz536 page 23 Fixes: 9d141c1 ("serial: 8250_omap: Implementation of Errata i2310") Cc: stable@vger.kernel.org Reported-by: Vignesh Raghavendra <vigneshr@ti.com> Closes: https://lore.kernel.org/all/e96d0c55-0b12-4cbf-9d23-48963543de49@ti.com/ Signed-off-by: Udit Kumar <u-kumar1@ti.com> Link: https://lore.kernel.org/r/20240625160725.2102194-1-u-kumar1@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bab4923 upstream. In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from qdisc->dev_queue->dev <NULL> ->name This situation simulated from bunch of veths and Bluetooth disconnection and reconnection. During qdisc initialization, qdisc was being set to noop_queue. In veth_init_queue, the initial tx_num was reduced back to one, causing the qdisc reset to be called with noop, which led to the kernel panic. I've attached the GitHub gist link that C converted syz-execprogram source code and 3 log of reproduced vmcore-dmesg. https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740 Yeoreum and I use two fuzzing tool simultaneously. One process with syz-executor : https://github.com/google/syzkaller $ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \ -enable=none -collide=false log1 The other process with perf fuzzer: https://github.com/deater/perf_event_tests/tree/master/fuzzer $ perf_event_tests/fuzzer/perf_fuzzer I think this will happen on the kernel version. Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10. This occurred from 51270d5. I think this patch is absolutely necessary. Previously, It was showing not intended string value of name. I've reproduced 3 time from my fedora 40 Debug Kernel with any other module or patched. version: 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug [ 5287.164555] veth0_vlan: left promiscuous mode [ 5287.164929] veth1_macvtap: left promiscuous mode [ 5287.164950] veth0_macvtap: left promiscuous mode [ 5287.164983] veth1_vlan: left promiscuous mode [ 5287.165008] veth0_vlan: left promiscuous mode [ 5287.165450] veth1_macvtap: left promiscuous mode [ 5287.165472] veth0_macvtap: left promiscuous mode [ 5287.165502] veth1_vlan: left promiscuous mode … [ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state [ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state [ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state [ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state [ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0 [ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state [ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state [ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0 [ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state … [ 5298.002798] bridge_slave_0: left promiscuous mode [ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state [ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface [ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface [ 5298.320207] bond0 (unregistering): Released all slaves [ 5298.354296] hsr_slave_0: left promiscuous mode [ 5298.360750] hsr_slave_1: left promiscuous mode [ 5298.374889] veth1_macvtap: left promiscuous mode [ 5298.374931] veth0_macvtap: left promiscuous mode [ 5298.374988] veth1_vlan: left promiscuous mode [ 5298.375024] veth0_vlan: left promiscuous mode [ 5299.109741] team0 (unregistering): Port device team_slave_1 removed [ 5299.185870] team0 (unregistering): Port device team_slave_0 removed … [ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1 [ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9 [ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9 …. [ 5301.075531] team0: Port device team_slave_1 added [ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state [ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state [ 5301.085588] bridge_slave_0: entered allmulticast mode [ 5301.085800] bridge_slave_0: entered promiscuous mode [ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state [ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state … [ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link [ 5301.193481] hsr_slave_0: entered promiscuous mode [ 5301.204425] hsr_slave_1: entered promiscuous mode [ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.210185] Cannot create hsr debugfs directory [ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link [ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.255934] team0: Port device team_slave_0 added [ 5301.256480] team0: Port device team_slave_1 added [ 5301.256948] team0: Port device team_slave_0 added … [ 5301.435928] hsr_slave_0: entered promiscuous mode [ 5301.446029] hsr_slave_1: entered promiscuous mode [ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.455884] Cannot create hsr debugfs directory [ 5301.502664] hsr_slave_0: entered promiscuous mode [ 5301.513675] hsr_slave_1: entered promiscuous mode [ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.526164] Cannot create hsr debugfs directory [ 5301.563662] hsr_slave_0: entered promiscuous mode [ 5301.576129] hsr_slave_1: entered promiscuous mode [ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.580270] Cannot create hsr debugfs directory [ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0 [ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137] [ 5301.595877] Mem abort info: [ 5301.595881] ESR = 0x0000000096000006 [ 5301.595885] EC = 0x25: DABT (current EL), IL = 32 bits [ 5301.595889] SET = 0, FnV = 0 [ 5301.595893] EA = 0, S1PTW = 0 [ 5301.595896] FSC = 0x06: level 2 translation fault [ 5301.595900] Data abort info: [ 5301.595903] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 5301.595907] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 5301.595911] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 5301.595915] [dfff800000000026] address between user and kernel address ranges [ 5301.595971] Internal error: Oops: 0000000096000006 [#1] SMP … [ 5301.596076] CPU: 2 PID: 102769 Comm: syz-executor.3 Kdump: loaded Tainted: G W ------- --- 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug #1 [ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 5301.596085] pc : strnlen+0x40/0x88 [ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0 [ 5301.596124] sp : ffff8000beef6b40 [ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001 [ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00 [ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140 [ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8 [ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006 [ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5 [ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec [ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200 [ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1 [ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130 [ 5301.596166] Call trace: [ 5301.596175] strnlen+0x40/0x88 [ 5301.596179] trace_event_get_offsets_qdisc_reset+0x6c/0x2b0 [ 5301.596182] perf_trace_qdisc_reset+0xb0/0x538 [ 5301.596184] __traceiter_qdisc_reset+0x68/0xc0 [ 5301.596188] qdisc_reset+0x43c/0x5e8 [ 5301.596190] netif_set_real_num_tx_queues+0x288/0x770 [ 5301.596194] veth_init_queues+0xfc/0x130 [veth] [ 5301.596198] veth_newlink+0x45c/0x850 [veth] [ 5301.596202] rtnl_newlink_create+0x2c8/0x798 [ 5301.596205] __rtnl_newlink+0x92c/0xb60 [ 5301.596208] rtnl_newlink+0xd8/0x130 [ 5301.596211] rtnetlink_rcv_msg+0x2e0/0x890 [ 5301.596214] netlink_rcv_skb+0x1c4/0x380 [ 5301.596225] rtnetlink_rcv+0x20/0x38 [ 5301.596227] netlink_unicast+0x3c8/0x640 [ 5301.596231] netlink_sendmsg+0x658/0xa60 [ 5301.596234] __sock_sendmsg+0xd0/0x180 [ 5301.596243] __sys_sendto+0x1c0/0x280 [ 5301.596246] __arm64_sys_sendto+0xc8/0x150 [ 5301.596249] invoke_syscall+0xdc/0x268 [ 5301.596256] el0_svc_common.constprop.0+0x16c/0x240 [ 5301.596259] do_el0_svc+0x48/0x68 [ 5301.596261] el0_svc+0x50/0x188 [ 5301.596265] el0t_64_sync_handler+0x120/0x130 [ 5301.596268] el0t_64_sync+0x194/0x198 [ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842) [ 5301.596285] SMP: stopping secondary CPUs [ 5301.597053] Starting crashdump kernel... [ 5301.597057] Bye! After applying our patch, I didn't find any kernel panic errors. We've found a simple reproducer # echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable # ip link add veth0 type veth peer name veth1 Error: Unknown device type. However, without our patch applied, I tested upstream 6.10.0-rc3 kernel using the qdisc_reset event and the ip command on my qemu virtual machine. This 2 commands makes always kernel panic. Linux version: 6.10.0-rc3 [ 0.000000] Linux version 6.10.0-rc3-00164-g44ef20baed8e-dirty (paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld version 2.41-34.fc40) torvalds#20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024 Kernel panic message: [ 615.236484] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 615.237250] Dumping ftrace buffer: [ 615.237679] (ftrace buffer empty) [ 615.238097] Modules linked in: veth crct10dif_ce virtio_gpu virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6 Jun 22 02:36:5[3 6k152.62-4sm98k4-0k]v kCePUr:n e1l :P IUDn:a b4le6 8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0 1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08 615.252376] Hardware name: linux,dummy-virt (DT) [ 615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 615.254433] pc : strnlen+0x6c/0xe0 [ 615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0 [ 615.256088] sp : ffff800080b269a0 [ 615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27: 0000000000000001 [ 615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24: ffffc070f619cf60 [ 615.259020] x23: 0000000000000128 x22: 0000000000000138 x21: dfff800000000000 [ 615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18: ffffc070f448b800 [ 615.261454] x17: 0000000000000000 x16: 0000000000000001 x15: ffffc070f4ba2a90 [ 615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12: 1ffff00010164d72 [ 615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 : ffffc070e85d6184 [ 615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 : 000000001504a6d3 [ 615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 : 0000000000000000 [ 615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 : 0000000000000000 [ 615.268747] Call trace: [ 615.269180] strnlen+0x6c/0xe0 [ 615.269767] trace_event_get_offsets_qdisc_reset+0x94/0x3d0 [ 615.270716] trace_event_raw_event_qdisc_reset+0xe8/0x4e8 [ 615.271667] __traceiter_qdisc_reset+0xa0/0x140 [ 615.272499] qdisc_reset+0x554/0x848 [ 615.273134] netif_set_real_num_tx_queues+0x360/0x9a8 [ 615.274050] veth_init_queues+0x110/0x220 [veth] [ 615.275110] veth_newlink+0x538/0xa50 [veth] [ 615.276172] __rtnl_newlink+0x11e4/0x1bc8 [ 615.276944] rtnl_newlink+0xac/0x120 [ 615.277657] rtnetlink_rcv_msg+0x4e4/0x1370 [ 615.278409] netlink_rcv_skb+0x25c/0x4f0 [ 615.279122] rtnetlink_rcv+0x48/0x70 [ 615.279769] netlink_unicast+0x5a8/0x7b8 [ 615.280462] netlink_sendmsg+0xa70/0x1190 Yeoreum and I don't know if the patch we wrote will fix the underlying cause, but we think that priority is to prevent kernel panic happening. So, we're sending this patch. Fixes: 51270d5 ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string") Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/ Cc: netdev@vger.kernel.org Tested-by: Yunseong Kim <yskelg@gmail.com> Signed-off-by: Yunseong Kim <yskelg@gmail.com> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240702170243.963426416@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Christian Heusel <christian@heusel.eu> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Kelsey Steele <kelseysteele@linux.microsoft.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
"ioft" is a 48.16 fixed point type. Signed-off-by: Janne Grunau <j@jannau.net>
Apple Silicon devices, particularly Macs, continue Apple's tradition of integrating a ridiculous number of sensors on their products. Most of these report to the System Management Controller, and their data is exposed by the SMC firmware as simple key-value pairs. This commit adds an hwmon drive for retrieving the data reported by the temperature, voltage, current and power sensors, as well as fan data. Devices each expose their own unique set of sensors and fans, even with the SMC being baked into the SoC. This driver walks a devicetree node in order to discover the sensors present on the current device at runtime. Co-developed-by: Janne Grunau <j@jannau.net> Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
hwmon: macsmc: Avoid warning on 32-bit compile test builds Signed-off-by: Janne Grunau <j@jannau.net>
Should be squashed into "hwmon: add macsmc-hwmon driver" Signed-off-by: Janne Grunau <j@jannau.net>
jiegec
force-pushed
the
apple-m1-counter-asahi
branch
from
July 11, 2024 03:46
691807f
to
e969bd9
Compare
[ Upstream commit 4823696 ] The non-contiguous CBM test fails on AMD with: Starting L3_NONCONT_CAT test ... Mounting resctrl to "/sys/fs/resctrl" CPUID output doesn't match 'sparse_masks' file content! not ok 5 L3_NONCONT_CAT: test AMD always supports non-contiguous CBM but does not report it via CPUID. Fix the non-contiguous CBM test to use CPUID to discover non-contiguous CBM support only on Intel. Fixes: ae63855 ("selftests/resctrl: Add non-contiguous CBMs CAT test") Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4cd4722 ] Using of devm API leads to a certain order of releasing resources. So all dependent resources which are not devm-wrapped should be deleted with respect to devm-release order. Mutex is one of such objects that often is bound to other resources and has no own devm wrapping. Since mutex_destroy() actually does nothing in non-debug builds frequently calling mutex_destroy() is just ignored which is safe for now but wrong formally and can lead to a problem if mutex_destroy() will be extended so introduce devm_mutex_init(). Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: George Stark <gnstark@salutedevices.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Marek Behún <kabel@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20240411161032.609544-2-gnstark@salutedevices.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: efc347b ("leds: mlxreg: Use devm_mutex_init() for mutex initialization") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efc347b ] In this driver LEDs are registered using devm_led_classdev_register() so they are automatically unregistered after module's remove() is done. led_classdev_unregister() calls module's led_set_brightness() to turn off the LEDs and that callback uses mutex which was destroyed already in module's remove() so use devm API instead. Signed-off-by: George Stark <gnstark@salutedevices.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20240411161032.609544-8-gnstark@salutedevices.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c382e2e ] In this driver LEDs are registered using devm_led_classdev_register() so they are automatically unregistered after module's remove() is done. led_classdev_unregister() calls module's led_set_brightness() to turn off the LEDs and that callback uses mutex which was destroyed already in module's remove() so use devm API instead. Signed-off-by: George Stark <gnstark@salutedevices.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20240411161032.609544-9-gnstark@salutedevices.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8be0913 ] During the zip probe process, the debugfs failure does not stop the probe. When debugfs initialization fails, jumping to the error branch will also release regs, in addition to its own rollback operation. As a result, it may be released repeatedly during the regs uninit process. Therefore, the null check needs to be added to the regs uninit process. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6683c6 ] lima uses a shared interrupt, so the interrupt handlers must be prepared to be called at any time. At driver removal time, the clocks are disabled early and the interrupts stay registered until the very end of the remove process due to the devm usage. This is potentially a bug as the interrupts access device registers which assumes clocks are enabled. A crash can be triggered by removing the driver in a kernel with CONFIG_DEBUG_SHIRQ enabled. This patch frees the interrupts at each lima device finishing callback so that the handlers are already unregistered by the time we fully disable clocks. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240401224329.1228468-2-nunes.erico@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0db880f ] nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel crash when invoked during real mode interrupt handling (e.g. early HMI/MCE interrupt handler) if percpu allocation comes from vmalloc area. Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI() wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when percpu allocation is from the embedded first chunk. However with CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu allocation can come from the vmalloc area. With kernel command line "percpu_alloc=page" we can force percpu allocation to come from vmalloc area and can see kernel crash in machine_check_early: [ 1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110 [ 1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0 [ 1.215719] --- interrupt: 200 [ 1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable) [ 1.215722] [c000000fffd731b0] [0000000000000000] 0x0 [ 1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8 Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu first chunk is not embedded. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Shirisha Ganta <shirisha@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 309422d ] This structure is embedded in multiple other structures that are packed, which conflicts with it being aligned. drivers/media/usb/as102/as10x_cmd.h:379:30: warning: field reg_addr within 'struct as10x_dump_memory::(unnamed at drivers/media/usb/as102/as10x_cmd.h:373:2)' is less aligned than 'struct as10x_register_addr' and is usually due to 'struct as10x_dump_memory::(unnamed at drivers/media/usb/as102/as10x_cmd.h:373:2)' being packed, which can lead to unaligned accesses [-Wunaligned-access] Mark it as being packed. Marking the inner struct as 'packed' does not change the layout, since the whole struct is already packed, it just silences the clang warning. See also this llvm discussion: llvm/llvm-project#55520 Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b267c2 ] Add missing release_firmware on the error paths. drivers/media/usb/dvb-usb/dib0700_devices.c:2415 stk9090m_frontend_attach() warn: 'state->frontend_firmware' from request_firmware() not released on lines: 2415. drivers/media/usb/dvb-usb/dib0700_devices.c:2497 nim9090md_frontend_attach() warn: 'state->frontend_firmware' from request_firmware() not released on lines: 2489,2497. Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4854b46 ] If the dql_queued() function receives an invalid argument, WARN about it and continue, instead of crashing the kernel. This was raised by checkpatch, when I am refactoring this code (see following patch/commit) WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240411192241.2498631-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
…sband [ Upstream commit bb38626 ] We have some policy via BIOS to block uses of 6 GHz. In this case, 6 GHz sband will be NULL even if it is WiFi 7 chip. So, add NULL handling here to avoid crash. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://msgid.link/20240412115729.8316-3-pkshih@realtek.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9116f6 ] Any kunit doing any memory access should get their own runtime_pm outer references since they don't use the standard driver API entries. In special this dma_buf from the same driver. Found by pre-merge CI on adding WARN calls for unprotected inner callers: <6> [318.639739] # xe_dma_buf_kunit: running xe_test_dmabuf_import_same_driver <4> [318.639957] ------------[ cut here ]------------ <4> [318.639967] xe 0000:4d:00.0: Missing outer runtime PM protection <4> [318.640049] WARNING: CPU: 117 PID: 3832 at drivers/gpu/drm/xe/xe_pm.c:533 xe_pm_runtime_get_noresume+0x48/0x60 [xe] Cc: Matthew Auld <matthew.auld@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-10-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca0b44e ] The existing behavior of ib_umad, which maintains received MAD packets in an unbounded list, poses a risk of uncontrolled growth. As user-space applications extract packets from this list, the rate of extraction may not match the rate of incoming packets, leading to potential list overflow. To address this, we introduce a limit to the size of the list. After considering typical scenarios, such as OpenSM processing, which can handle approximately 100k packets per second, and the 1-second retry timeout for most packets, we set the list size limit to 200k. Packets received beyond this limit are dropped, assuming they are likely timed out by the time they are handled by user-space. Notably, packets queued on the receive list due to reasons like timed-out sends are preserved even when the list is full. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7197cb58a7d9e78399008f25036205ceab07fbd5.1713268818.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d8b637 ] Stop calling smp_processor_id() from preemptible code in qedf_execute_tmf90. This results in BUG_ON() when running an RT kernel. [ 659.343280] BUG: using smp_processor_id() in preemptible [00000000] code: sg_reset/3646 [ 659.343282] caller is qedf_execute_tmf+0x8b/0x360 [qedf] Tested-by: Guangwu Zhang <guazhang@redhat.com> Cc: Saurav Kashyap <skashyap@marvell.com> Cc: Nilesh Javali <njavali@marvell.com> Signed-off-by: John Meneghini <jmeneghi@redhat.com> Link: https://lore.kernel.org/r/20240403150155.412954-1-jmeneghi@redhat.com Acked-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1479eaf ] Test case dummy_st_ops/dummy_init_ret_value passes NULL as the first parameter of the test_1() function. Mark this parameter as nullable to make verifier aware of such possibility. Otherwise, NULL check in the test_1() code: SEC("struct_ops/test_1") int BPF_PROG(test_1, struct bpf_dummy_ops_state *state) { if (!state) return ...; ... access state ... } Might be removed by verifier, thus triggering NULL pointer dereference under certain conditions. Reported-by: Jose E. Marchesi <jemarch@gnu.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240424012821.595216-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3b3b84a ] As reported by Jose E. Marchesi in off-list discussion, GCC and LLVM generate slightly different code for dummy_st_ops_success/test_1(): SEC("struct_ops/test_1") int BPF_PROG(test_1, struct bpf_dummy_ops_state *state) { int ret; if (!state) return 0xf2f3f4f5; ret = state->val; state->val = 0x5a; return ret; } GCC-generated LLVM-generated ---------------------------- --------------------------- 0: r1 = *(u64 *)(r1 + 0x0) 0: w0 = -0xd0c0b0b 1: if r1 == 0x0 goto 5f 1: r1 = *(u64 *)(r1 + 0x0) 2: r0 = *(s32 *)(r1 + 0x0) 2: if r1 == 0x0 goto 6f 3: *(u32 *)(r1 + 0x0) = 0x5a 3: r0 = *(u32 *)(r1 + 0x0) 4: exit 4: w2 = 0x5a 5: r0 = -0xd0c0b0b 5: *(u32 *)(r1 + 0x0) = r2 6: exit 6: exit If the 'state' argument is not marked as nullable in net/bpf/bpf_dummy_struct_ops.c, the verifier would assume that 'r1 == 0x0' is never true: - for the GCC version, this means that instructions #5-6 would be marked as dead and removed; - for the LLVM version, all instructions would be marked as live. The test dummy_st_ops/dummy_init_ret_value actually sets the 'state' parameter to NULL. Therefore, when the 'state' argument is not marked as nullable, the GCC-generated version of the code would trigger a NULL pointer dereference at instruction #3. This patch updates the test_1() test case to always follow a shape similar to the GCC-generated version above, in order to verify whether the 'state' nullability is marked correctly. Reported-by: Jose E. Marchesi <jemarch@gnu.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240424012821.595216-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Each SoC's SMC exposes a different set of hardware sensor keys, however there are a number that are shared between all currently supported SoCs. Describe these in a .dtsi so that we don't need to duplicate them across every SoC. Likewise, the fans on every machine are exposed as the same set of keys on each. Add .dtsis for these too. Co-developed-by: Janne Grunau <j@jannau.net> Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Work around mutability issues when entity.new_job() takes a mutable reference to the entity by moving all the fields used by the submit_render() and submit_compute() functions to an inner struct, eliminating the double-mutable-borrow. Signed-off-by: Asahi Lina <lina@asahilina.net>
fix typo Signed-off-by: Janne Grunau <j@jannau.net>
Fixes failure to unmap buffers in dcpep_cb_unmap_piodma() due to the unaligned size. Further along this causes kernel log splat when DCP tries to map the buffers again since thye IOVA is still in use. This causes no apparent issue although map_piodma callback signals an errror and returns 0 (unmapped as DVA). It's not clear why this presents only randomly. Possibly some build or uninitialized memory triggers this unmap/free and immediate allocate/map cycle in the DCP firmware. I never notices this with a clang-built kernel on j314c. It showed with gcc build with the Fedora config at least on 6.8.8 based kernels. This did not reproduce on j375d. Signed-off-by: Janne Grunau <j@jannau.net>
Apple released their performance counters definition in /usr/share/kpep/a14.plist file on macOS. The file lists the branch counters (number of branch instructions and branch mispredictions): "INST_BRANCH" => { "counters_mask" => 224 "description" => "Retired branch instructions including calls and returns" "number" => 141 } and "BRANCH_MISPRED_NONSPEC" => { "counters_mask" => 224 "description" => "Retired branch instructions including calls and returns that mispredicted" "number" => 203 } In this commit, the performance counters numbered 0x8d(141) and 0xcb(203) are renamed to their actual names and the mappings from generalized performence event types in `enum perf_hw_id` to hardware performance counters are added respectively. Signed-off-by: Jiajie Chen <c@jia.je>
jiegec
force-pushed
the
apple-m1-counter-asahi
branch
from
July 21, 2024 13:47
e969bd9
to
50e9947
Compare
jannau
force-pushed
the
asahi
branch
2 times, most recently
from
July 29, 2024 11:01
539cbbc
to
83a2784
Compare
cherry-picked on top of Yangyu Chen work to name all registers, to be released as asahi-6.10.7-1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Apple released their performance counters definition in /usr/share/kpep/a14.plist file on macOS. The file lists the branch counters (number of branch instructions and branch mispredictions):
"INST_BRANCH" => {
"counters_mask" => 224
"description" => "Retired branch instructions including calls and
returns"
"number" => 141
}
and
"BRANCH_MISPRED_NONSPEC" => {
"counters_mask" => 224
"description" => "Retired branch instructions including calls and
returns that mispredicted"
"number" => 203
}
In this commit, the performance counters numbered 0x8d(141) and 0xcb(203) are renamed to their actual names and the mappings from generalized performence event types in
enum perf_hw_id
to hardware performance counters are added respectively.