Attempt to avoid random kernel panic reboots #1

ClassfulNick · 2021-11-01T16:36:31Z

I'm experiencing random kernel panic reboots on my alioth. Googling strings appeared in /sys/fs/pstore/console-ramoops-0 like "qmi_rmnet_set_powersave_mode" etc led me here and that user then said "No longer getting crashes/reboots with the August 2nd build but instead get a bunch of network resets"

I think at least network resets seem to be better than the whole system crashes randomly. Therefore I decided to patch the kernel & see what's coming. I've managed to compile the kernel (with the help of tons of googling...) & it seems to work, except that a "There is an internal problem in your device. Please contact your manufacturer" popup shows every time the device boots into lockscreen which doesn't seem to be an issue.

I built the kernel without doing a full repo sync etc. I just cloned this "android_kernel_xiaomi_sm8250" repository.

The latest android-ndk-r23b didn't seem to work, so that I switched to older android-ndk-r22b & it worked.

I modified .config (originally pulled from /proc/config.gz on alioth) to build some modules I'm personally interested in, like nfs, tcp_bbr etc. I then found that the /vendor has insufficient inode count to contain additional kernel driver files so that I reformatted that partition to enlarge inode count.

export PATH=${NDK_PATH}/toolchains/llvm/prebuilt/linux-x86_64/bin:${NDK_PATH}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin:${NDK_PATH}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:$PATH

make -j$(nproc --all) O=output_dir INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=modules_install_dir ARCH=arm64 SUBARCH=arm64 CC=clang CLANG_TRIPLE=aarch64-linux-gnu- CROSS_COMPILE=aarch64-linux-android- CROSS_COMPILE_ARM32=arm-linux-androideabi-

make modules_install

KERNEL_VERSION=$(awk '{print $3}' output_dir/include/generated/utsrelease.h | tr -d "\"")
mkdir -p output_dir/modules/lib/modules/${KERNEL_VERSION}/
cp output_dir/modules.builtin output_dir/modules/lib/modules/${KERNEL_VERSION}/
cp output_dir/modules.order output_dir/modules/lib/modules/${KERNEL_VERSION}/
find output_dir/modules_install_dir | grep ko$ | while read FILE; do
    cp ${FILE} output_dir/modules/lib/modules/${KERNEL_VERSION}/
done
depmod --all --basedir=output_dir/modules --filesyms=output_dir/System.map --symvers=output_dir/Module.symvers ${KERNEL_VERSION}

There were some compilation errors but I managed to get through them. Also, I found that make clean (or make mrproper etc) deleted some driver firmware files with .i file extension, but then I found that this issue doesn't arise if O=output_dir is given.

Finally I used libmagiskboot.so inside Magisk.apk to replace the kernel.

* Something has been significantly wrong with this driver. Under mobile data, I see it keeps setting up alarms which triggers every seconds, causing device unable go into suspend, showing logs like "device alarmtimer failed to suspend". * IDK why this is happening, maybe something has been wrong in userspace or other parts of kernel. * Note: This is also happening on stock kernel. * Revert it for a dirty fix. This reverts commit 9f3ef66. Signed-off-by: LibXZR <xzr467706992@163.com>

* After commit "Revert "dfc: Use alarm timer to trigger powersave work", this workqueue is now is use for qmi_rmnet_check_stats. Make it unbound and freezable to save power. Signed-off-by: LibXZR <xzr467706992@163.com>

ClassfulNick · 2021-11-02T19:28:28Z

Still panic

This fixes a warning and kernel panic when connecting to a hotspot hosted by the device: [ 568.838882] arm-smmu 15000000.apps-smmu: FAR = 0x00000000a0b51228 [ 568.838908] arm-smmu 15000000.apps-smmu: PAR = 0x0000000000000000 [ 568.838917] arm-smmu 15000000.apps-smmu: FSR = 0x40000402 [TF R SS ] [ 568.838924] arm-smmu 15000000.apps-smmu: TTBR0 = 0x0000000000000000 [ 568.838930] arm-smmu 15000000.apps-smmu: TTBR1 = 0x0000000000000000 [ 568.838938] arm-smmu 15000000.apps-smmu: SCTLR = 0x00c000e7 ACTLR = 0x00000003 [ 568.838973] arm-smmu 15000000.apps-smmu: CBAR = 0x0001f300 [ 568.838980] arm-smmu 15000000.apps-smmu: MAIR0 = 0xf404ff44 MAIR1 = 0x00000000 [ 568.838999] arm-smmu 15000000.apps-smmu: Unhandled context fault: iova=0xa0b51228, cb=46, fsr=0x40000402, fsynr0=0x4f0003, fsynr1=0x0 [ 568.839009] arm-smmu 15000000.apps-smmu: soft iova-to-phys=0x0000000000000000 [ 568.839018] arm-smmu 15000000.apps-smmu: SOFTWARE TABLE WALK FAILED! Looks like 15000000.apps-smmu accessed an unmapped address! [ 568.839025] arm-smmu 15000000.apps-smmu: hard iova-to-phys (ATOS) failed [ 568.839032] arm-smmu 15000000.apps-smmu: SID=0x521 [ 568.839038] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault! [ 568.839068] ------------[ cut here ]------------ [ 568.839074] kernel BUG at ../drivers/iommu/arm-smmu.c:1583! [ 568.839084] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 568.839095] Process irq/405-arm-smm (pid: 557, stack limit = 0x0000000071ffe420) [ 568.839107] CPU: 2 PID: 557 Comm: irq/405-arm-smm Tainted: G S 4.14.138-Proton-g26234daa #38 [ 568.839114] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 KIRIN MP (DT) [ 568.839120] task: 00000000d8caf9cc task.stack: 0000000071ffe420 [ 568.839140] pc : arm_smmu_context_fault+0x580/0x600 [ 568.839146] lr : arm_smmu_context_fault+0x580/0x600 [ 568.839151] sp : ffffff8014efbce0 pstate : 60c00145 [ 568.839157] x29: ffffffcff335ac80 x28: 0000000000028000 [ 568.839165] x27: ffffff800c4ae058 x26: 0000000000000000 [ 568.839172] x25: ffffff800c4ae000 x24: ffffffcfe6578818 [ 568.839178] x23: 0000000000000521 x22: ffffffcff335ac80 [ 568.839185] x21: 00000000ffffffda x20: ffffffcfe6578920 [ 568.839192] x19: 0000000040000402 x18: ffffff9f0521e000 [ 568.839198] x17: 0000000000000039 x16: ffffff9f045af620 [ 568.839205] x15: ffffff9f04ae26f9 x14: 0000000000003338 [ 568.839211] x13: 000a7d8b33fb1800 x12: 0000000000000000 [ 568.839218] x11: 0000000000000001 x10: 0000000000000007 [ 568.839224] x9 : 77d23af7cb54d900 x8 : 77d23af7cb54d900 [ 568.839231] x7 : 0000000000000000 x6 : ffffff9f0522142a [ 568.839237] x5 : 0000000000000036 x4 : 0000000000000008 [ 568.839244] x3 : 0000000000000000 x2 : 0000000000000000 [ 568.839251] x1 : 00000000000001c0 x0 : 000000000000003e [ 568.839259] [ 568.839259] PC: 0xffffff9f03a8af60: [ 568.839266] af60 910203e2 94045ed1 14000004 900065c1 911b2821 94045ecd f94003a0 b0006201 [ 568.839284] af80 910e4821 2a1703e2 94045ec8 376800fc f94003a0 f0006a21 91264021 94045ec3 [ 568.839297] afa0 d4210000 14000000 2a1f03fa 310042bf 54000100 b9000373 d5033e9f b9405fe8 [ 568.839310] afc0 34000088 91002328 52800029 b9000109 f940a3b3 aa1303e0 97fff232 aa1303e0 [ 568.839326] [ 568.839326] LR: 0xffffff9f03a8af60: [ 568.839331] af60 910203e2 94045ed1 14000004 900065c1 911b2821 94045ecd f94003a0 b0006201 [ 568.839344] af80 910e4821 2a1703e2 94045ec8 376800fc f94003a0 f0006a21 91264021 94045ec3 [ 568.839357] afa0 d4210000 14000000 2a1f03fa 310042bf 54000100 b9000373 d5033e9f b9405fe8 [ 568.839370] afc0 34000088 91002328 52800029 b9000109 f940a3b3 aa1303e0 97fff232 aa1303e0 [ 568.839385] [ 568.839385] SP: 0xffffff8014efbca0: [ 568.839391] bca0 03a8afa0 ffffff9f 60c00145 00000000 ffffffd0 00000000 047d1990 ffffff9f [ 568.839405] bcc0 ffffffff ffffffff cb54d900 77d23af7 f335ac80 ffffffcf 03a8afa0 ffffff9f [ 568.839418] bce0 047da794 ffffff9f 047da794 ffffff9f 047da794 ffffff9f 046be95a ffffff9f [ 568.839432] bd00 047da794 ffffff9f 14efbd10 ffffff80 00010000 00028000 0c400000 ffffff80 [ 568.839447] [ 568.839452] Call trace: [ 568.839461] arm_smmu_context_fault+0x580/0x600 [ 568.839473] Code: f94003a0 f0006a21 91264021 94045ec3 (d4210000) [ 568.839482] ---[ end trace b8f0de5036a861a3 ]--- [ 568.847582] icnss: Received force error fatal request from FW [ 568.847773] icnss: Received early crash indication from FW [ 568.848772] PD_ERR: wlan_process : EF:wlan_process:0x1:WLAN RT:0x1079:cmnos_thread.c:3968:Asserted in whal_interrupt_wifi_ip02.c:whalGetPendingInterrupts:512 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Change-Id: Idd5183cce197016cb64df1faf990d6f606616ef1

The new optimized memcmp doesn't work well on device memory, and when subsys tries to load any fw, we are met with: [ 11.111213] ueventd: firmware: loading 'cdsp.mdt' for '/devices/platform/soc/8300000.qcom,turing/firmware/cdsp.mdt' [ 11.113128] ueventd: loading /devices/platform/soc/8300000.qcom,turing/firmware/cdsp.mdt took 2ms [ 11.113170] subsys-pil-tz 8300000.qcom,turing: cdsp: loading from 0x0000000099100000 to 0x000000009a500000 [ 11.117481] ueventd: firmware: loading 'adsp.mdt' for '/devices/platform/soc/17300000.qcom,lpass/firmware/adsp.mdt' [ 11.117518] Unable to handle kernel paging request at virtual address ffffff801f7c6d5c [ 11.117522] Mem abort info: [ 11.117525] Exception class = DABT (current EL), IL = 32 bits [ 11.117527] SET = 0, FnV = 0 [ 11.117529] EA = 0, S1PTW = 0 [ 11.117530] FSC = 33 [ 11.117532] Data abort info: [ 11.117534] ISV = 0, ISS = 0x00000061 [ 11.117536] CM = 0, WnR = 1 [ 11.117539] swapper pgtable: 4k pages, 39-bit VAs, pgd = 000000003e4fd651 [ 11.117541] [ffffff801f7c6d5c] *pgd=00000001f8883003, *pud=00000001f8883003, *pmd=00000001f30fe003, *pte=00680000fd475703 [ 11.117547] Internal error: Oops: 96000061 [xiaomi-sm8250-devs#1] PREEMPT SMP [ 11.117551] Modules linked in: [ 11.117554] Process init (pid: 568, stack limit = 0x000000005be89f40) [ 11.117558] CPU: 4 PID: 568 Comm: init Tainted: G S W 4.14.239-MOCHI xiaomi-sm8250-devs#1 [ 11.117560] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 VAYU (DT) [ 11.117562] task: 00000000e65c9d8d task.stack: 000000005be89f40 [ 11.117570] pc : memcpy+0x188/0x2a0 [ 11.117577] lr : pil_init_image_trusted+0x130/0x234 [ 11.117579] sp : ffffff80229937c0 pstate : 80000145 [ 11.117581] x29: ffffff80229937c0 x28: ffffffeaebf93a00 [ 11.117584] x27: ffffff8599334000 x26: 0000000000000040 [ 11.117586] x25: ffffff859a0aeaa0 x24: ffffff859a63a000 [ 11.117589] x23: ffffffeaf557d880 x22: ffffff801f7c5000 [ 11.117592] x21: 0000000000001d9c x20: ffffff801f65d000 [ 11.117594] x19: ffffff859933cb08 x18: 0000007c2b138000 [ 11.117597] x17: 0000007ebf1ef1e4 x16: ffffff8597c02274 [ 11.117599] x15: ffffffffffffffff x14: ffffffffffffffff [ 11.117602] x13: ffffffffffffffff x12: ffffffffffffffff [ 11.117604] x11: ffffffffffffffff x10: ffffffffffffffff [ 11.117607] x9 : ffffffffffffffff x8 : ffffffffffffffff [ 11.117609] x7 : ffffffffffffffff x6 : ffffffffffffffff [ 11.117612] x5 : ffffff801f7c6d9c x4 : ffffff801f65ed9c [ 11.117614] x3 : ffffff801f7c6d40 x2 : ffffffffffffffcc [ 11.117617] x1 : ffffff801f65ed80 x0 : ffffff801f7c5000 [ 11.117620] [ 11.117620] PC: 0xffffff8598c00e28: [ 11.117622] 0e28 a9022468 a9422428 a9032c6a a9432c2a a984346c a9c4342c f1010042 54fffee8 [ 11.117628] 0e48 a97c3c8e a9011c66 a97d1c86 a9022468 a97e2488 a9032c6a a97f2c8a a904346c [ 11.117634] 0e68 a93c3cae a93d1ca6 a93e24a8 a93f2caa d65f03c0 d503201f a97f348c 92400cae [ 11.117639] 0e88 cb0e0084 cb0e0042 a97f1c86 a93f34ac a97e2488 a97d2c8a a9fc348c cb0e00a5 [ 11.117645] [ 11.117645] LR: 0xffffff8597ef067c: [ 11.117647] 067c 97eeeccc f9400265 b4fffe65 52801803 910143e2 aa1503e1 910183e0 d2804004 [ 11.117652] 069c 72a02803 d63f00a0 aa0003f6 17ffffe9 aa1403e1 aa1503e2 aa1603e0 9434418a [ 11.117658] 06bc 97ff9590 72001c1f b9404ae1 f9402be0 54000241 910133e4 910163e2 d2800085 [ 11.117663] 06dc d2800103 290b03e1 52800021 52800040 97ff9657 2a0003f3 f9413bf4 f9402bf7 [ 11.117669] [ 11.117669] SP: 0xffffff8022993780: [ 11.117671] 3780 98c00e68 ffffff85 80000145 00000000 9a0aeaa0 ffffff85 00000040 00000000 [ 11.117677] 37a0 ffffffff 0000007f 97ef0644 ffffff85 229937c0 ffffff80 98c00e68 ffffff85 [ 11.117682] 37c0 22993ba0 ffffff80 97eeec4c ffffff85 f557d918 ffffffea 00000000 00000000 [ 11.117688] 37e0 f6264460 ffffffea f6264400 ffffffea f55b9480 ffffffea 97b704d8 ffffff85 [ 11.117693] [ 11.117695] Call trace: [ 11.117698] memcpy+0x188/0x2a0 [ 11.117701] pil_boot+0x358/0x730 [ 11.117704] subsys_powerup+0x28/0x30 [ 11.117709] subsys_start+0x38/0x134 [ 11.117711] __subsystem_get+0xb0/0x11c [ 11.117713] subsystem_get+0x10/0x18 [ 11.117716] cdsp_loader_do.isra.0+0xe4/0x1a8 [ 11.117718] cdsp_boot_store+0x8c/0x168 [ 11.117722] kobj_attr_store+0x14/0x24 [ 11.117726] sysfs_kf_write+0x34/0x44 [ 11.117729] kernfs_fop_write+0x118/0x184 [ 11.117733] __vfs_write+0x2c/0xd8 [ 11.117735] vfs_write+0x80/0xec [ 11.117738] SyS_write+0x54/0xac [ 11.117741] el0_svc_naked+0x34/0x38 [ 11.117744] Code: a97e2488 a9032c6a a97f2c8a a904346c (a93c3cae) [ 11.117747] ---[ end trace fc45fc8b1fa34513 ]--- by looking at the link register we track it back to the function and use the alternative to fix it. test: device boots with the new optimized string routines

…spend() Add irq request check condition before enabling swrm interrupt This will make sure to only enabled interrupt request when it is in disabled state. This Fixes: W : ------------[ cut here ]------------ W : Unbalanced enable for IRQ 502 W : WARNING: CPU: 4 PID: 81 at kernel/irq/manage.c:621 enable_irq+0x98/0xf0 I : Modules linked in: I : CPU: 4 PID: 81 Comm: kworker/4:1 Tainted: G S 4.19.197-IMMENSITY-g09a924c384cb xiaomi-sm8250-devs#1 I Hardware name: Qualcomm Technologies, Inc. xiaomi alioth (DT) I Workqueue: pm pm_runtime_work I pstate : 60c00085 (nZCv daIf +PAN +UAO) I pc : enable_irq+0x98/0xf0 I lr : enable_irq+0x98/0xf0 I sp : ffffff800885bbb0 I : x29: ffffff800885bbc0 x28: ffffffa6fea0db38 I : x27: 0000000000000002 x26: 0000000000000000 I : x25: ffffffd1991ef37d x24: 0000000000000000 I : x23: ffffffd1991edca1 x22: ffffffd199555410 I : x21: ffffffd1991ef248 x20: 00000000000001f6 I : x19: ffffffd18cc7b400 x18: ffffffd1b4e9f048 I : x17: 0000000000000000 x16: 0000000000000000 I : x15: 0000000000000086 x14: 0000000000000030 I : x13: 0000000000049754 x12: 0000000000000000 I : x11: 0000000000000000 x10: 0000000000000007 I : x9 : 060ca0f25e42ae00 x8 : 060ca0f25e42ae00 I : x7 : 0000000000000000 x6 : ffffffa6fed3f8e5 I : x5 : 00000000001b68dc x4 : 000000000000000e I : x3 : 0000000000000032 x2 : 0000000000000007 I : x1 : 0000000000000007 x0 : 000000000000001d I Call trace: I : enable_irq+0x98/0xf0 I : swrm_runtime_suspend+0x390/0x47c I : pm_generic_runtime_suspend+0x28/0x3c I : __rpm_callback+0x12c/0x218 I : rpm_suspend+0x420/0x7cc I : pm_runtime_work+0x98/0xa8 I : process_one_work+0x228/0x3f4 I : worker_thread+0x264/0x4b0 I : kthread+0x13c/0x158 I : ret_from_fork+0x10/0x18 W : ---[ end trace 56c9cc0df5ea202b ]--- Change-Id: Ic539bfc8d595faf530361d32e0be4ce9009fec08 Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: LibXZR <i@xzr.moe> Signed-off-by: Carlos Ayrton Lopez Arroyo <15030201@itcelaya.edu.mx>

There were many order-3 fail allocation report while VM had lots of *reclaimable* memory. 17353.434071] kworker/u16:4 invoked oom-killer: gfp_mask=0x6160c0(GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC), nodemask=(null), order=3, oom_score_adj=0 [17353.434079] kworker/u16:4 cpuset=/ mems_allowed=0 [17353.434086] CPU: 6 PID: 30045 Comm: kworker/u16:4 Tainted: G S WC O 4.19.95-g8137b6ce669e-ab6554412 xiaomi-sm8250-devs#1 [17353.434089] Hardware name: Google Inc. MSM sm7250 v2 Bramble DVT (DT) [17353.434194] Workqueue: iparepwq95 __typeid__ZTSFiP44ipa_disable_force_clear_datapath_req_msg_v01E_global_addr [ipa3] [17353.434197] Call trace: [17353.434206] __typeid__ZTSFjP11task_structPK11user_regsetE_global_addr+0x14/0x18 [17353.434210] dump_stack+0xbc/0xf8 [17353.434217] dump_header+0xc8/0x250 [17353.434220] oom_kill_process+0x130/0x538 [17353.434222] out_of_memory+0x320/0x444 [17353.434226] __alloc_pages_nodemask+0x1124/0x13b4 [17353.434314] ipa3_alloc_rx_pkt_page+0x64/0x1a8 [ipa3] [17353.434403] ipa3_wq_page_repl+0x78/0x1a4 [ipa3] [17353.434407] process_one_work+0x3a8/0x6e4 [17353.434410] worker_thread+0x394/0x820 [17353.434413] kthread+0x19c/0x1ac [17353.434417] ret_from_fork+0x10/0x18 [17353.434419] Mem-Info: [17353.434424] active_anon:357378 inactive_anon:119141 isolated_anon:13\x0a active_file:97495 inactive_file:122151 isolated_file:22\x0a unevictable:49750 dirty:3553 writeback:0 unstable:0\x0a slab_reclaimable:30018 slab_unreclaimable:73884\x0a mapped:259586 shmem:27580 pagetables:39581 bounce:0\x0a free:17710 free_pcp:301 free_cma:0 [17353.434433] Node 0 active_anon:1429512kB inactive_anon:476564kB active_file:389980kB inactive_file:488604kB unevictable:199000kB isolated(anon):52kB isolated(file):88kB mapped:1038344kB dirty:14212kB writeback:0kB shmem:110320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [17353.434439] Normal free:70840kB min:9172kB low:43900kB high:49484kB active_anon:1429284kB inactive_anon:476336kB active_file:389980kB inactive_file:488604kB unevictable:199000kB writepending:14212kB present:5764280kB managed:5584928kB mlocked:199000kB kernel_stack:92656kB shadow_call_stack:5792kB pagetables:158324kB bounce:0kB free_pcp:1204kB local_pcp:108kB free_cma:0kB [17353.434441] lowmem_reserve[]: 0 0 [17353.434444] Normal: 8956*4kB (UMEH) 2726*8kB (UH) 751*16kB (UH) 33*32kB (H) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71152kB [17353.434451] 300317 total pagecache pages [17353.434454] 4228 pages in swap cache [17353.434456] Swap cache stats: add 20710158, delete 20707317, find 1014864/9891370 [17353.434459] Free swap = 103732kB [17353.434460] Total swap = 2097148kB [17353.434462] 1441070 pages RAM [17353.434465] 0 pages HighMem/MovableOnly [17353.434466] 44838 pages reserved [17353.434469] 73728 pages cma reserved When we saw the trace, compaction finished with COMPACT_COMPLETE(iow, it already did full scanning a zone but failed to create order-3 allocation) so should_compact_retry returns "false". <...>-30045 [006] .... 17353.433704: reclaim_retry_zone: node=0 zone=Normal order=3 reclaimable=696132 available=713920 min_wmark=2293 no_progress_loops=0 wmark_check=0 <...>-30045 [006] .... 17353.433706: compact_retry: order=3 priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 max_retries=16 should_retry=0 If we see previous trace, we could see compaction is hard to find free pages in the zone so free scanner of compaction moves fast toward migration scanner and finally, they(migration scanner and free page scanner) crossed over. <...>-30045 [006] .... 17353.427026: mm_compaction_isolate_freepages: range=(0x144c00 ~ 0x145000) nr_scanned=784 nr_taken=0 <...>-30045 [006] .... 17353.427037: mm_compaction_isolate_freepages: range=(0x144800 ~ 0x144c00) nr_scanned=1019 nr_taken=0 <...>-30045 [006] .... 17353.427049: mm_compaction_isolate_freepages: range=(0x144400 ~ 0x144800) nr_scanned=880 nr_taken=1 <...>-30045 [006] .... 17353.427061: mm_compaction_isolate_freepages: range=(0x144000 ~ 0x144400) nr_scanned=869 nr_taken=0 <...>-30045 [006] .... 17353.427212: mm_compaction_isolate_freepages: range=(0x140c00 ~ 0x141000) nr_scanned=1016 nr_taken=0 .. .. <...>-30045 [006] .... 17353.433696: mm_compaction_finished: node=0 zone=Normal order=3 ret=complete <...>-30045 [006] .... 17353.433698: mm_compaction_end: zone_start=0x80600 migrate_pfn=0xc9400 free_pfn=0xc9500 zone_end=0x200000, mode=sync status=complete If we see previous trace to see reclaim activities, we could see it was not hard to reclaim memory. <...>-30045 [006] .... 17353.413941: mm_vmscan_direct_reclaim_begin: order=3 may_writepage=1 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC classzone_idx=0 <...>-30045 [006] d..1 17353.413946: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=8 nr_scanned=8 nr_skipped=0 nr_taken=8 lru=inactive_anon <...>-30045 [006] .... 17353.413958: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=8 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=8 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.413960: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.413965: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=22 nr_scanned=22 nr_skipped=0 nr_taken=22 lru=inactive_file <...>-30045 [006] .... 17353.413979: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.413979: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122195 inactive=122195 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] .... 17353.413980: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414134: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414135: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.414141: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=29 nr_scanned=29 nr_skipped=0 nr_taken=29 lru=inactive_anon <...>-30045 [006] .... 17353.414170: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=29 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=29 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414170: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119107 inactive=119107 total_active=357385 active=357385 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.414176: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=active_anon <...>-30045 [006] .... 17353.414206: mm_vmscan_lru_shrink_active: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=32 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] d..1 17353.414212: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=inactive_file <...>-30045 [006] .... 17353.414225: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414225: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122131 inactive=122131 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] d..1 17353.414228: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=16 nr_scanned=16 nr_skipped=0 nr_taken=16 lru=inactive_file <...>-30045 [006] .... 17353.414235: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=16 nr_reclaimed=16 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414235: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122115 inactive=122115 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] .... 17353.414236: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119139 inactive=119139 total_active=357353 active=357353 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414320: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414321: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414339: mm_vmscan_direct_reclaim_end: nr_reclaimed=70 Based on that, we could assume that if reclaimer has reclaimed more pages, compaction could find free pages easily so free scanner of compaction were not moved fast like that. That means it wouldn't fail for non-costly high-order allocation. What this patch does is if the order is non-costly high order allocation, it will keep trying migration with reclaiming if system has enough reclaimable memory. Bug: 159909686 Bug: 156785617 Bug: 158449887

@0ctobot

Subsequent to commit 4d7f8d7 ("Revert "sched: Prevent raising SCHED_SOFTIRQ when CPU is !active"") A CPU can be paused while it is idle with it's tick stopped. nohz_balance_exit_idle() should be called from the local CPU, so it can't be called during pause which can happen remotely. This results in paused CPU participating in the nohz idle balance, which should be avoided. This can be done by calling Fix this issue by calling nohz_balance_exit_idle() from the paused CPU when it exits and enters idle again. This lazy approach avoids waking the CPU from idle during pause. [@0ctobot: This is an attempt to silence the following dmesg WARN_ON: [ 6.081902] ------------[ cut here ]------------ [ 6.081909] (flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK [ 6.081922] WARNING: CPU: 6 PID: 0 at _nohz_idle_balance+0x408/0x410 [ 6.081926] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G S 4.19.202-NeutrinoKernel-lothal-EXP xiaomi-sm8250-devs#1 [ 6.081928] Hardware name: Qualcomm Technologies, Inc. kona MTP 19805 20809 (DT) [ 6.081931] pstate: 60400005 (nZCv daif +PAN -UAO) [ 6.081933] pc : _nohz_idle_balance+0x408/0x410 [ 6.081935] lr : _nohz_idle_balance+0x400/0x410 [ 6.081937] sp : ffffff8010033df0 [ 6.081938] x29: ffffff8010033e30 x28: ffffff9d37e75040 [ 6.081940] x27: ffffff9d37e572c0 x26: ffffff9d35c96050 [ 6.081942] x25: 0000000000000007 x24: 0000000000000001 [ 6.081944] x23: ffffff9d37e76000 x22: fffffffcb39e2a00 [ 6.081946] x21: 0000000000000006 x20: 00000000ffff8d30 [ 6.081947] x19: 0000000000000000 x18: ffffff8010035038 [ 6.081949] x17: 0000000000000008 x16: 0000000000000000 [ 6.081951] x15: 0000000000000008 x14: ffffff9d37f6dc18 [ 6.081953] x13: 0000000000000004 x12: ffffff9d37e76000 [ 6.081954] x11: 0000000000000000 x10: 0000000000000000 [ 6.081956] x9 : a2d84b4c1c6fa900 x8 : a2d84b4c1c6fa900 [ 6.081957] x7 : 000000000000002d x6 : ffffff9d38115e65 [ 6.081959] x5 : ffffff9d38113500 x4 : 0000000000000000 [ 6.081961] x3 : 000000000000004b x2 : 0000000000000000 [ 6.081962] x1 : ffffff9d3811352d x0 : 000000000000002d [ 6.081964] Call trace: [ 6.081966] _nohz_idle_balance+0x408/0x410 [ 6.081968] run_rebalance_domains+0x78/0xa8 [ 6.081971] __do_softirq+0x144/0x2f4 [ 6.081975] irq_exit+0x170/0x184 [ 6.081978] scheduler_ipi+0x160/0x1d4 [ 6.081980] handle_IPI+0x94/0x3e0 [ 6.081982] gic_handle_irq.17896+0x100/0x180 [ 6.081984] el1_irq+0xc4/0x140 [ 6.081988] lpm_cpuidle_enter+0x61c/0x6d0 [ 6.081991] cpuidle_enter_state+0x1dc/0x368 [ 6.081993] do_idle+0x67c/0x74c [ 6.081995] cpu_startup_entry+0x58/0x5c [ 6.081997] secondary_start_kernel+0x13c/0x140 [ 6.081999] ---[ end trace 6de171506e562a81 ]---] Bug: 180530906 Change-Id: Ia2dfd9c9cac9b0f37c55a9256b9d5f3141ca0421 Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the cpuset code feeding empty cpumasks to the sched domain rebuild machinery. This leads to the following splat: Internal error: Oops: 96000004 [xiaomi-sm8250-devs#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23 Hardware name: ARM Juno development board (r0) (DT) Workqueue: events cpuset_hotplug_workfn pstate: 60000005 (nZCv daif -PAN -UAO) pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969) lr : build_sched_domains (kernel/sched/topology.c:1966) Call trace: build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969) partition_sched_domains_locked (kernel/sched/topology.c:2250) rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019) rebuild_sched_domains (kernel/cgroup/cpuset.c:1032) cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2)) process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274) worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416) kthread (kernel/kthread.c:255) ret_from_fork (arch/arm64/kernel/entry.S:1167) Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853) The faulty line in question is: cap = arch_scale_cpu_capacity(cpumask_first(cpu_map)); and we're not checking the return value against nr_cpu_ids (we shouldn't have to!), which leads to the above. Prevent generate_sched_domains() from returning empty cpumasks, and add some assertion in build_sched_domains() to scream bloody murder if it happens again. The above splat was obtained on my Juno r0 with the following reproducer: $ cgcreate -g cpuset:asym $ cgset -r cpuset.cpus=0-3 asym $ cgset -r cpuset.mems=0 asym $ cgset -r cpuset.cpu_exclusive=1 asym $ cgcreate -g cpuset:smp $ cgset -r cpuset.cpus=4-5 smp $ cgset -r cpuset.mems=0 smp $ cgset -r cpuset.cpu_exclusive=1 smp $ cgset -r cpuset.sched_load_balance=0 . $ echo 0 > /sys/devices/system/cpu/cpu4/online $ echo 0 > /sys/devices/system/cpu/cpu5/online Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar.Eggemann@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hannes@cmpxchg.org Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: qperret@google.com Cc: tj@kernel.org Cc: vincent.guittot@linaro.org Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection") Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com Change-Id: I984799e501e21f682e2f785fe0beef8757056fd1

The new optimized memcmp doesn't work well on device memory, and when subsys tries to load any fw, we are met with: [ 11.111213] ueventd: firmware: loading 'cdsp.mdt' for '/devices/platform/soc/8300000.qcom,turing/firmware/cdsp.mdt' [ 11.113128] ueventd: loading /devices/platform/soc/8300000.qcom,turing/firmware/cdsp.mdt took 2ms [ 11.113170] subsys-pil-tz 8300000.qcom,turing: cdsp: loading from 0x0000000099100000 to 0x000000009a500000 [ 11.117481] ueventd: firmware: loading 'adsp.mdt' for '/devices/platform/soc/17300000.qcom,lpass/firmware/adsp.mdt' [ 11.117518] Unable to handle kernel paging request at virtual address ffffff801f7c6d5c [ 11.117522] Mem abort info: [ 11.117525] Exception class = DABT (current EL), IL = 32 bits [ 11.117527] SET = 0, FnV = 0 [ 11.117529] EA = 0, S1PTW = 0 [ 11.117530] FSC = 33 [ 11.117532] Data abort info: [ 11.117534] ISV = 0, ISS = 0x00000061 [ 11.117536] CM = 0, WnR = 1 [ 11.117539] swapper pgtable: 4k pages, 39-bit VAs, pgd = 000000003e4fd651 [ 11.117541] [ffffff801f7c6d5c] *pgd=00000001f8883003, *pud=00000001f8883003, *pmd=00000001f30fe003, *pte=00680000fd475703 [ 11.117547] Internal error: Oops: 96000061 [xiaomi-sm8250-devs#1] PREEMPT SMP [ 11.117551] Modules linked in: [ 11.117554] Process init (pid: 568, stack limit = 0x000000005be89f40) [ 11.117558] CPU: 4 PID: 568 Comm: init Tainted: G S W 4.14.239-MOCHI xiaomi-sm8250-devs#1 [ 11.117560] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 VAYU (DT) [ 11.117562] task: 00000000e65c9d8d task.stack: 000000005be89f40 [ 11.117570] pc : memcpy+0x188/0x2a0 [ 11.117577] lr : pil_init_image_trusted+0x130/0x234 [ 11.117579] sp : ffffff80229937c0 pstate : 80000145 [ 11.117581] x29: ffffff80229937c0 x28: ffffffeaebf93a00 [ 11.117584] x27: ffffff8599334000 x26: 0000000000000040 [ 11.117586] x25: ffffff859a0aeaa0 x24: ffffff859a63a000 [ 11.117589] x23: ffffffeaf557d880 x22: ffffff801f7c5000 [ 11.117592] x21: 0000000000001d9c x20: ffffff801f65d000 [ 11.117594] x19: ffffff859933cb08 x18: 0000007c2b138000 [ 11.117597] x17: 0000007ebf1ef1e4 x16: ffffff8597c02274 [ 11.117599] x15: ffffffffffffffff x14: ffffffffffffffff [ 11.117602] x13: ffffffffffffffff x12: ffffffffffffffff [ 11.117604] x11: ffffffffffffffff x10: ffffffffffffffff [ 11.117607] x9 : ffffffffffffffff x8 : ffffffffffffffff [ 11.117609] x7 : ffffffffffffffff x6 : ffffffffffffffff [ 11.117612] x5 : ffffff801f7c6d9c x4 : ffffff801f65ed9c [ 11.117614] x3 : ffffff801f7c6d40 x2 : ffffffffffffffcc [ 11.117617] x1 : ffffff801f65ed80 x0 : ffffff801f7c5000 [ 11.117620] [ 11.117620] PC: 0xffffff8598c00e28: [ 11.117622] 0e28 a9022468 a9422428 a9032c6a a9432c2a a984346c a9c4342c f1010042 54fffee8 [ 11.117628] 0e48 a97c3c8e a9011c66 a97d1c86 a9022468 a97e2488 a9032c6a a97f2c8a a904346c [ 11.117634] 0e68 a93c3cae a93d1ca6 a93e24a8 a93f2caa d65f03c0 d503201f a97f348c 92400cae [ 11.117639] 0e88 cb0e0084 cb0e0042 a97f1c86 a93f34ac a97e2488 a97d2c8a a9fc348c cb0e00a5 [ 11.117645] [ 11.117645] LR: 0xffffff8597ef067c: [ 11.117647] 067c 97eeeccc f9400265 b4fffe65 52801803 910143e2 aa1503e1 910183e0 d2804004 [ 11.117652] 069c 72a02803 d63f00a0 aa0003f6 17ffffe9 aa1403e1 aa1503e2 aa1603e0 9434418a [ 11.117658] 06bc 97ff9590 72001c1f b9404ae1 f9402be0 54000241 910133e4 910163e2 d2800085 [ 11.117663] 06dc d2800103 290b03e1 52800021 52800040 97ff9657 2a0003f3 f9413bf4 f9402bf7 [ 11.117669] [ 11.117669] SP: 0xffffff8022993780: [ 11.117671] 3780 98c00e68 ffffff85 80000145 00000000 9a0aeaa0 ffffff85 00000040 00000000 [ 11.117677] 37a0 ffffffff 0000007f 97ef0644 ffffff85 229937c0 ffffff80 98c00e68 ffffff85 [ 11.117682] 37c0 22993ba0 ffffff80 97eeec4c ffffff85 f557d918 ffffffea 00000000 00000000 [ 11.117688] 37e0 f6264460 ffffffea f6264400 ffffffea f55b9480 ffffffea 97b704d8 ffffff85 [ 11.117693] [ 11.117695] Call trace: [ 11.117698] memcpy+0x188/0x2a0 [ 11.117701] pil_boot+0x358/0x730 [ 11.117704] subsys_powerup+0x28/0x30 [ 11.117709] subsys_start+0x38/0x134 [ 11.117711] __subsystem_get+0xb0/0x11c [ 11.117713] subsystem_get+0x10/0x18 [ 11.117716] cdsp_loader_do.isra.0+0xe4/0x1a8 [ 11.117718] cdsp_boot_store+0x8c/0x168 [ 11.117722] kobj_attr_store+0x14/0x24 [ 11.117726] sysfs_kf_write+0x34/0x44 [ 11.117729] kernfs_fop_write+0x118/0x184 [ 11.117733] __vfs_write+0x2c/0xd8 [ 11.117735] vfs_write+0x80/0xec [ 11.117738] SyS_write+0x54/0xac [ 11.117741] el0_svc_naked+0x34/0x38 [ 11.117744] Code: a97e2488 a9032c6a a97f2c8a a904346c (a93c3cae) [ 11.117747] ---[ end trace fc45fc8b1fa34513 ]--- by looking at the link register we track it back to the function and use the alternative to fix it. test: device boots with the new optimized string routines

…spend() Add irq request check condition before enabling swrm interrupt This will make sure to only enabled interrupt request when it is in disabled state. This Fixes: W : ------------[ cut here ]------------ W : Unbalanced enable for IRQ 502 W : WARNING: CPU: 4 PID: 81 at kernel/irq/manage.c:621 enable_irq+0x98/0xf0 I : Modules linked in: I : CPU: 4 PID: 81 Comm: kworker/4:1 Tainted: G S 4.19.197-IMMENSITY-g09a924c384cb xiaomi-sm8250-devs#1 I Hardware name: Qualcomm Technologies, Inc. xiaomi alioth (DT) I Workqueue: pm pm_runtime_work I pstate : 60c00085 (nZCv daIf +PAN +UAO) I pc : enable_irq+0x98/0xf0 I lr : enable_irq+0x98/0xf0 I sp : ffffff800885bbb0 I : x29: ffffff800885bbc0 x28: ffffffa6fea0db38 I : x27: 0000000000000002 x26: 0000000000000000 I : x25: ffffffd1991ef37d x24: 0000000000000000 I : x23: ffffffd1991edca1 x22: ffffffd199555410 I : x21: ffffffd1991ef248 x20: 00000000000001f6 I : x19: ffffffd18cc7b400 x18: ffffffd1b4e9f048 I : x17: 0000000000000000 x16: 0000000000000000 I : x15: 0000000000000086 x14: 0000000000000030 I : x13: 0000000000049754 x12: 0000000000000000 I : x11: 0000000000000000 x10: 0000000000000007 I : x9 : 060ca0f25e42ae00 x8 : 060ca0f25e42ae00 I : x7 : 0000000000000000 x6 : ffffffa6fed3f8e5 I : x5 : 00000000001b68dc x4 : 000000000000000e I : x3 : 0000000000000032 x2 : 0000000000000007 I : x1 : 0000000000000007 x0 : 000000000000001d I Call trace: I : enable_irq+0x98/0xf0 I : swrm_runtime_suspend+0x390/0x47c I : pm_generic_runtime_suspend+0x28/0x3c I : __rpm_callback+0x12c/0x218 I : rpm_suspend+0x420/0x7cc I : pm_runtime_work+0x98/0xa8 I : process_one_work+0x228/0x3f4 I : worker_thread+0x264/0x4b0 I : kthread+0x13c/0x158 I : ret_from_fork+0x10/0x18 W : ---[ end trace 56c9cc0df5ea202b ]--- Change-Id: Ic539bfc8d595faf530361d32e0be4ce9009fec08 Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: LibXZR <i@xzr.moe> Signed-off-by: Carlos Ayrton Lopez Arroyo <15030201@itcelaya.edu.mx>

There were many order-3 fail allocation report while VM had lots of *reclaimable* memory. 17353.434071] kworker/u16:4 invoked oom-killer: gfp_mask=0x6160c0(GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC), nodemask=(null), order=3, oom_score_adj=0 [17353.434079] kworker/u16:4 cpuset=/ mems_allowed=0 [17353.434086] CPU: 6 PID: 30045 Comm: kworker/u16:4 Tainted: G S WC O 4.19.95-g8137b6ce669e-ab6554412 xiaomi-sm8250-devs#1 [17353.434089] Hardware name: Google Inc. MSM sm7250 v2 Bramble DVT (DT) [17353.434194] Workqueue: iparepwq95 __typeid__ZTSFiP44ipa_disable_force_clear_datapath_req_msg_v01E_global_addr [ipa3] [17353.434197] Call trace: [17353.434206] __typeid__ZTSFjP11task_structPK11user_regsetE_global_addr+0x14/0x18 [17353.434210] dump_stack+0xbc/0xf8 [17353.434217] dump_header+0xc8/0x250 [17353.434220] oom_kill_process+0x130/0x538 [17353.434222] out_of_memory+0x320/0x444 [17353.434226] __alloc_pages_nodemask+0x1124/0x13b4 [17353.434314] ipa3_alloc_rx_pkt_page+0x64/0x1a8 [ipa3] [17353.434403] ipa3_wq_page_repl+0x78/0x1a4 [ipa3] [17353.434407] process_one_work+0x3a8/0x6e4 [17353.434410] worker_thread+0x394/0x820 [17353.434413] kthread+0x19c/0x1ac [17353.434417] ret_from_fork+0x10/0x18 [17353.434419] Mem-Info: [17353.434424] active_anon:357378 inactive_anon:119141 isolated_anon:13\x0a active_file:97495 inactive_file:122151 isolated_file:22\x0a unevictable:49750 dirty:3553 writeback:0 unstable:0\x0a slab_reclaimable:30018 slab_unreclaimable:73884\x0a mapped:259586 shmem:27580 pagetables:39581 bounce:0\x0a free:17710 free_pcp:301 free_cma:0 [17353.434433] Node 0 active_anon:1429512kB inactive_anon:476564kB active_file:389980kB inactive_file:488604kB unevictable:199000kB isolated(anon):52kB isolated(file):88kB mapped:1038344kB dirty:14212kB writeback:0kB shmem:110320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [17353.434439] Normal free:70840kB min:9172kB low:43900kB high:49484kB active_anon:1429284kB inactive_anon:476336kB active_file:389980kB inactive_file:488604kB unevictable:199000kB writepending:14212kB present:5764280kB managed:5584928kB mlocked:199000kB kernel_stack:92656kB shadow_call_stack:5792kB pagetables:158324kB bounce:0kB free_pcp:1204kB local_pcp:108kB free_cma:0kB [17353.434441] lowmem_reserve[]: 0 0 [17353.434444] Normal: 8956*4kB (UMEH) 2726*8kB (UH) 751*16kB (UH) 33*32kB (H) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71152kB [17353.434451] 300317 total pagecache pages [17353.434454] 4228 pages in swap cache [17353.434456] Swap cache stats: add 20710158, delete 20707317, find 1014864/9891370 [17353.434459] Free swap = 103732kB [17353.434460] Total swap = 2097148kB [17353.434462] 1441070 pages RAM [17353.434465] 0 pages HighMem/MovableOnly [17353.434466] 44838 pages reserved [17353.434469] 73728 pages cma reserved When we saw the trace, compaction finished with COMPACT_COMPLETE(iow, it already did full scanning a zone but failed to create order-3 allocation) so should_compact_retry returns "false". <...>-30045 [006] .... 17353.433704: reclaim_retry_zone: node=0 zone=Normal order=3 reclaimable=696132 available=713920 min_wmark=2293 no_progress_loops=0 wmark_check=0 <...>-30045 [006] .... 17353.433706: compact_retry: order=3 priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 max_retries=16 should_retry=0 If we see previous trace, we could see compaction is hard to find free pages in the zone so free scanner of compaction moves fast toward migration scanner and finally, they(migration scanner and free page scanner) crossed over. <...>-30045 [006] .... 17353.427026: mm_compaction_isolate_freepages: range=(0x144c00 ~ 0x145000) nr_scanned=784 nr_taken=0 <...>-30045 [006] .... 17353.427037: mm_compaction_isolate_freepages: range=(0x144800 ~ 0x144c00) nr_scanned=1019 nr_taken=0 <...>-30045 [006] .... 17353.427049: mm_compaction_isolate_freepages: range=(0x144400 ~ 0x144800) nr_scanned=880 nr_taken=1 <...>-30045 [006] .... 17353.427061: mm_compaction_isolate_freepages: range=(0x144000 ~ 0x144400) nr_scanned=869 nr_taken=0 <...>-30045 [006] .... 17353.427212: mm_compaction_isolate_freepages: range=(0x140c00 ~ 0x141000) nr_scanned=1016 nr_taken=0 .. .. <...>-30045 [006] .... 17353.433696: mm_compaction_finished: node=0 zone=Normal order=3 ret=complete <...>-30045 [006] .... 17353.433698: mm_compaction_end: zone_start=0x80600 migrate_pfn=0xc9400 free_pfn=0xc9500 zone_end=0x200000, mode=sync status=complete If we see previous trace to see reclaim activities, we could see it was not hard to reclaim memory. <...>-30045 [006] .... 17353.413941: mm_vmscan_direct_reclaim_begin: order=3 may_writepage=1 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC classzone_idx=0 <...>-30045 [006] d..1 17353.413946: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=8 nr_scanned=8 nr_skipped=0 nr_taken=8 lru=inactive_anon <...>-30045 [006] .... 17353.413958: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=8 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=8 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.413960: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.413965: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=22 nr_scanned=22 nr_skipped=0 nr_taken=22 lru=inactive_file <...>-30045 [006] .... 17353.413979: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.413979: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122195 inactive=122195 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] .... 17353.413980: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414134: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414135: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.414141: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=29 nr_scanned=29 nr_skipped=0 nr_taken=29 lru=inactive_anon <...>-30045 [006] .... 17353.414170: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=29 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=29 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414170: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119107 inactive=119107 total_active=357385 active=357385 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.414176: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=active_anon <...>-30045 [006] .... 17353.414206: mm_vmscan_lru_shrink_active: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=32 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] d..1 17353.414212: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=inactive_file <...>-30045 [006] .... 17353.414225: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414225: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122131 inactive=122131 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] d..1 17353.414228: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=16 nr_scanned=16 nr_skipped=0 nr_taken=16 lru=inactive_file <...>-30045 [006] .... 17353.414235: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=16 nr_reclaimed=16 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414235: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122115 inactive=122115 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] .... 17353.414236: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119139 inactive=119139 total_active=357353 active=357353 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414320: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414321: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414339: mm_vmscan_direct_reclaim_end: nr_reclaimed=70 Based on that, we could assume that if reclaimer has reclaimed more pages, compaction could find free pages easily so free scanner of compaction were not moved fast like that. That means it wouldn't fail for non-costly high-order allocation. What this patch does is if the order is non-costly high order allocation, it will keep trying migration with reclaiming if system has enough reclaimable memory. Bug: 159909686 Bug: 156785617 Bug: 158449887

@0ctobot

Subsequent to commit 4d7f8d7 ("Revert "sched: Prevent raising SCHED_SOFTIRQ when CPU is !active"") A CPU can be paused while it is idle with it's tick stopped. nohz_balance_exit_idle() should be called from the local CPU, so it can't be called during pause which can happen remotely. This results in paused CPU participating in the nohz idle balance, which should be avoided. This can be done by calling Fix this issue by calling nohz_balance_exit_idle() from the paused CPU when it exits and enters idle again. This lazy approach avoids waking the CPU from idle during pause. [@0ctobot: This is an attempt to silence the following dmesg WARN_ON: [ 6.081902] ------------[ cut here ]------------ [ 6.081909] (flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK [ 6.081922] WARNING: CPU: 6 PID: 0 at _nohz_idle_balance+0x408/0x410 [ 6.081926] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G S 4.19.202-NeutrinoKernel-lothal-EXP xiaomi-sm8250-devs#1 [ 6.081928] Hardware name: Qualcomm Technologies, Inc. kona MTP 19805 20809 (DT) [ 6.081931] pstate: 60400005 (nZCv daif +PAN -UAO) [ 6.081933] pc : _nohz_idle_balance+0x408/0x410 [ 6.081935] lr : _nohz_idle_balance+0x400/0x410 [ 6.081937] sp : ffffff8010033df0 [ 6.081938] x29: ffffff8010033e30 x28: ffffff9d37e75040 [ 6.081940] x27: ffffff9d37e572c0 x26: ffffff9d35c96050 [ 6.081942] x25: 0000000000000007 x24: 0000000000000001 [ 6.081944] x23: ffffff9d37e76000 x22: fffffffcb39e2a00 [ 6.081946] x21: 0000000000000006 x20: 00000000ffff8d30 [ 6.081947] x19: 0000000000000000 x18: ffffff8010035038 [ 6.081949] x17: 0000000000000008 x16: 0000000000000000 [ 6.081951] x15: 0000000000000008 x14: ffffff9d37f6dc18 [ 6.081953] x13: 0000000000000004 x12: ffffff9d37e76000 [ 6.081954] x11: 0000000000000000 x10: 0000000000000000 [ 6.081956] x9 : a2d84b4c1c6fa900 x8 : a2d84b4c1c6fa900 [ 6.081957] x7 : 000000000000002d x6 : ffffff9d38115e65 [ 6.081959] x5 : ffffff9d38113500 x4 : 0000000000000000 [ 6.081961] x3 : 000000000000004b x2 : 0000000000000000 [ 6.081962] x1 : ffffff9d3811352d x0 : 000000000000002d [ 6.081964] Call trace: [ 6.081966] _nohz_idle_balance+0x408/0x410 [ 6.081968] run_rebalance_domains+0x78/0xa8 [ 6.081971] __do_softirq+0x144/0x2f4 [ 6.081975] irq_exit+0x170/0x184 [ 6.081978] scheduler_ipi+0x160/0x1d4 [ 6.081980] handle_IPI+0x94/0x3e0 [ 6.081982] gic_handle_irq.17896+0x100/0x180 [ 6.081984] el1_irq+0xc4/0x140 [ 6.081988] lpm_cpuidle_enter+0x61c/0x6d0 [ 6.081991] cpuidle_enter_state+0x1dc/0x368 [ 6.081993] do_idle+0x67c/0x74c [ 6.081995] cpu_startup_entry+0x58/0x5c [ 6.081997] secondary_start_kernel+0x13c/0x140 [ 6.081999] ---[ end trace 6de171506e562a81 ]---] Bug: 180530906 Change-Id: Ia2dfd9c9cac9b0f37c55a9256b9d5f3141ca0421 Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>

There were many order-3 fail allocation report while VM had lots of *reclaimable* memory. 17353.434071] kworker/u16:4 invoked oom-killer: gfp_mask=0x6160c0(GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC), nodemask=(null), order=3, oom_score_adj=0 [17353.434079] kworker/u16:4 cpuset=/ mems_allowed=0 [17353.434086] CPU: 6 PID: 30045 Comm: kworker/u16:4 Tainted: G S WC O 4.19.95-g8137b6ce669e-ab6554412 xiaomi-sm8250-devs#1 [17353.434089] Hardware name: Google Inc. MSM sm7250 v2 Bramble DVT (DT) [17353.434194] Workqueue: iparepwq95 __typeid__ZTSFiP44ipa_disable_force_clear_datapath_req_msg_v01E_global_addr [ipa3] [17353.434197] Call trace: [17353.434206] __typeid__ZTSFjP11task_structPK11user_regsetE_global_addr+0x14/0x18 [17353.434210] dump_stack+0xbc/0xf8 [17353.434217] dump_header+0xc8/0x250 [17353.434220] oom_kill_process+0x130/0x538 [17353.434222] out_of_memory+0x320/0x444 [17353.434226] __alloc_pages_nodemask+0x1124/0x13b4 [17353.434314] ipa3_alloc_rx_pkt_page+0x64/0x1a8 [ipa3] [17353.434403] ipa3_wq_page_repl+0x78/0x1a4 [ipa3] [17353.434407] process_one_work+0x3a8/0x6e4 [17353.434410] worker_thread+0x394/0x820 [17353.434413] kthread+0x19c/0x1ac [17353.434417] ret_from_fork+0x10/0x18 [17353.434419] Mem-Info: [17353.434424] active_anon:357378 inactive_anon:119141 isolated_anon:13\x0a active_file:97495 inactive_file:122151 isolated_file:22\x0a unevictable:49750 dirty:3553 writeback:0 unstable:0\x0a slab_reclaimable:30018 slab_unreclaimable:73884\x0a mapped:259586 shmem:27580 pagetables:39581 bounce:0\x0a free:17710 free_pcp:301 free_cma:0 [17353.434433] Node 0 active_anon:1429512kB inactive_anon:476564kB active_file:389980kB inactive_file:488604kB unevictable:199000kB isolated(anon):52kB isolated(file):88kB mapped:1038344kB dirty:14212kB writeback:0kB shmem:110320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [17353.434439] Normal free:70840kB min:9172kB low:43900kB high:49484kB active_anon:1429284kB inactive_anon:476336kB active_file:389980kB inactive_file:488604kB unevictable:199000kB writepending:14212kB present:5764280kB managed:5584928kB mlocked:199000kB kernel_stack:92656kB shadow_call_stack:5792kB pagetables:158324kB bounce:0kB free_pcp:1204kB local_pcp:108kB free_cma:0kB [17353.434441] lowmem_reserve[]: 0 0 [17353.434444] Normal: 8956*4kB (UMEH) 2726*8kB (UH) 751*16kB (UH) 33*32kB (H) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71152kB [17353.434451] 300317 total pagecache pages [17353.434454] 4228 pages in swap cache [17353.434456] Swap cache stats: add 20710158, delete 20707317, find 1014864/9891370 [17353.434459] Free swap = 103732kB [17353.434460] Total swap = 2097148kB [17353.434462] 1441070 pages RAM [17353.434465] 0 pages HighMem/MovableOnly [17353.434466] 44838 pages reserved [17353.434469] 73728 pages cma reserved When we saw the trace, compaction finished with COMPACT_COMPLETE(iow, it already did full scanning a zone but failed to create order-3 allocation) so should_compact_retry returns "false". <...>-30045 [006] .... 17353.433704: reclaim_retry_zone: node=0 zone=Normal order=3 reclaimable=696132 available=713920 min_wmark=2293 no_progress_loops=0 wmark_check=0 <...>-30045 [006] .... 17353.433706: compact_retry: order=3 priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 max_retries=16 should_retry=0 If we see previous trace, we could see compaction is hard to find free pages in the zone so free scanner of compaction moves fast toward migration scanner and finally, they(migration scanner and free page scanner) crossed over. <...>-30045 [006] .... 17353.427026: mm_compaction_isolate_freepages: range=(0x144c00 ~ 0x145000) nr_scanned=784 nr_taken=0 <...>-30045 [006] .... 17353.427037: mm_compaction_isolate_freepages: range=(0x144800 ~ 0x144c00) nr_scanned=1019 nr_taken=0 <...>-30045 [006] .... 17353.427049: mm_compaction_isolate_freepages: range=(0x144400 ~ 0x144800) nr_scanned=880 nr_taken=1 <...>-30045 [006] .... 17353.427061: mm_compaction_isolate_freepages: range=(0x144000 ~ 0x144400) nr_scanned=869 nr_taken=0 <...>-30045 [006] .... 17353.427212: mm_compaction_isolate_freepages: range=(0x140c00 ~ 0x141000) nr_scanned=1016 nr_taken=0 .. .. <...>-30045 [006] .... 17353.433696: mm_compaction_finished: node=0 zone=Normal order=3 ret=complete <...>-30045 [006] .... 17353.433698: mm_compaction_end: zone_start=0x80600 migrate_pfn=0xc9400 free_pfn=0xc9500 zone_end=0x200000, mode=sync status=complete If we see previous trace to see reclaim activities, we could see it was not hard to reclaim memory. <...>-30045 [006] .... 17353.413941: mm_vmscan_direct_reclaim_begin: order=3 may_writepage=1 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC classzone_idx=0 <...>-30045 [006] d..1 17353.413946: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=8 nr_scanned=8 nr_skipped=0 nr_taken=8 lru=inactive_anon <...>-30045 [006] .... 17353.413958: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=8 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=8 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.413960: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.413965: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=22 nr_scanned=22 nr_skipped=0 nr_taken=22 lru=inactive_file <...>-30045 [006] .... 17353.413979: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.413979: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122195 inactive=122195 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] .... 17353.413980: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119119 inactive=119119 total_active=357352 active=357352 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414134: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414135: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.414141: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=29 nr_scanned=29 nr_skipped=0 nr_taken=29 lru=inactive_anon <...>-30045 [006] .... 17353.414170: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=29 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=29 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414170: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119107 inactive=119107 total_active=357385 active=357385 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] d..1 17353.414176: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=active_anon <...>-30045 [006] .... 17353.414206: mm_vmscan_lru_shrink_active: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=32 priority=10 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC <...>-30045 [006] d..1 17353.414212: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=inactive_file <...>-30045 [006] .... 17353.414225: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414225: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122131 inactive=122131 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] d..1 17353.414228: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=3 nr_requested=16 nr_scanned=16 nr_skipped=0 nr_taken=16 lru=inactive_file <...>-30045 [006] .... 17353.414235: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=16 nr_reclaimed=16 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-30045 [006] .... 17353.414235: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=122115 inactive=122115 total_active=97508 active=97508 ratio=1 flags=RECLAIM_WB_FILE <...>-30045 [006] .... 17353.414236: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=119139 inactive=119139 total_active=357353 active=357353 ratio=3 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414320: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414321: mm_vmscan_inactive_list_is_low: nid=0 reclaim_idx=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON <...>-30045 [006] .... 17353.414339: mm_vmscan_direct_reclaim_end: nr_reclaimed=70 Based on that, we could assume that if reclaimer has reclaimed more pages, compaction could find free pages easily so free scanner of compaction were not moved fast like that. That means it wouldn't fail for non-costly high-order allocation. What this patch does is if the order is non-costly high order allocation, it will keep trying migration with reclaiming if system has enough reclaimable memory. Bug: 159909686 Bug: 156785617 Bug: 158449887

@0ctobot

Subsequent to commit 4d7f8d7 ("Revert "sched: Prevent raising SCHED_SOFTIRQ when CPU is !active"") A CPU can be paused while it is idle with it's tick stopped. nohz_balance_exit_idle() should be called from the local CPU, so it can't be called during pause which can happen remotely. This results in paused CPU participating in the nohz idle balance, which should be avoided. This can be done by calling Fix this issue by calling nohz_balance_exit_idle() from the paused CPU when it exits and enters idle again. This lazy approach avoids waking the CPU from idle during pause. [@0ctobot: This is an attempt to silence the following dmesg WARN_ON: [ 6.081902] ------------[ cut here ]------------ [ 6.081909] (flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK [ 6.081922] WARNING: CPU: 6 PID: 0 at _nohz_idle_balance+0x408/0x410 [ 6.081926] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G S 4.19.202-NeutrinoKernel-lothal-EXP xiaomi-sm8250-devs#1 [ 6.081928] Hardware name: Qualcomm Technologies, Inc. kona MTP 19805 20809 (DT) [ 6.081931] pstate: 60400005 (nZCv daif +PAN -UAO) [ 6.081933] pc : _nohz_idle_balance+0x408/0x410 [ 6.081935] lr : _nohz_idle_balance+0x400/0x410 [ 6.081937] sp : ffffff8010033df0 [ 6.081938] x29: ffffff8010033e30 x28: ffffff9d37e75040 [ 6.081940] x27: ffffff9d37e572c0 x26: ffffff9d35c96050 [ 6.081942] x25: 0000000000000007 x24: 0000000000000001 [ 6.081944] x23: ffffff9d37e76000 x22: fffffffcb39e2a00 [ 6.081946] x21: 0000000000000006 x20: 00000000ffff8d30 [ 6.081947] x19: 0000000000000000 x18: ffffff8010035038 [ 6.081949] x17: 0000000000000008 x16: 0000000000000000 [ 6.081951] x15: 0000000000000008 x14: ffffff9d37f6dc18 [ 6.081953] x13: 0000000000000004 x12: ffffff9d37e76000 [ 6.081954] x11: 0000000000000000 x10: 0000000000000000 [ 6.081956] x9 : a2d84b4c1c6fa900 x8 : a2d84b4c1c6fa900 [ 6.081957] x7 : 000000000000002d x6 : ffffff9d38115e65 [ 6.081959] x5 : ffffff9d38113500 x4 : 0000000000000000 [ 6.081961] x3 : 000000000000004b x2 : 0000000000000000 [ 6.081962] x1 : ffffff9d3811352d x0 : 000000000000002d [ 6.081964] Call trace: [ 6.081966] _nohz_idle_balance+0x408/0x410 [ 6.081968] run_rebalance_domains+0x78/0xa8 [ 6.081971] __do_softirq+0x144/0x2f4 [ 6.081975] irq_exit+0x170/0x184 [ 6.081978] scheduler_ipi+0x160/0x1d4 [ 6.081980] handle_IPI+0x94/0x3e0 [ 6.081982] gic_handle_irq.17896+0x100/0x180 [ 6.081984] el1_irq+0xc4/0x140 [ 6.081988] lpm_cpuidle_enter+0x61c/0x6d0 [ 6.081991] cpuidle_enter_state+0x1dc/0x368 [ 6.081993] do_idle+0x67c/0x74c [ 6.081995] cpu_startup_entry+0x58/0x5c [ 6.081997] secondary_start_kernel+0x13c/0x140 [ 6.081999] ---[ end trace 6de171506e562a81 ]---] Bug: 180530906 Change-Id: Ia2dfd9c9cac9b0f37c55a9256b9d5f3141ca0421 Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>

We use inline_dentry which requires to allocate dentry page when adding a link. If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem) twice by f2fs_lock_op(). I think this should be okay, but how about stopping the lockdep complaint [1]? f2fs_create() - f2fs_lock_op() - f2fs_do_add_link() - __f2fs_find_entry - f2fs_get_read_data_page() -> kswapd - shrink_node - f2fs_evict_inode - f2fs_lock_op() [1] fs_reclaim ){+.+.}-{0:0} : kswapd0: lock_acquire+0x114/0x394 kswapd0: __fs_reclaim_acquire+0x40/0x50 kswapd0: prepare_alloc_pages+0x94/0x1ec kswapd0: __alloc_pages_nodemask+0x78/0x1b0 kswapd0: pagecache_get_page+0x2e0/0x57c kswapd0: f2fs_get_read_data_page+0xc0/0x394 kswapd0: f2fs_find_data_page+0xa4/0x23c kswapd0: find_in_level+0x1a8/0x36c kswapd0: __f2fs_find_entry+0x70/0x100 kswapd0: f2fs_do_add_link+0x84/0x1ec kswapd0: f2fs_mkdir+0xe4/0x1e4 kswapd0: vfs_mkdir+0x110/0x1c0 kswapd0: do_mkdirat+0xa4/0x160 kswapd0: __arm64_sys_mkdirat+0x24/0x34 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 kswapd0: -> #1 ( &sbi->cp_rwsem ){++++}-{3:3} : kswapd0: lock_acquire+0x114/0x394 kswapd0: down_read+0x7c/0x98 kswapd0: f2fs_do_truncate_blocks+0x78/0x3dc kswapd0: f2fs_truncate+0xc8/0x128 kswapd0: f2fs_evict_inode+0x2b8/0x8b8 kswapd0: evict+0xd4/0x2f8 kswapd0: iput+0x1c0/0x258 kswapd0: do_unlinkat+0x170/0x2a0 kswapd0: __arm64_sys_unlinkat+0x4c/0x68 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 Cc: stable@vger.kernel.org Fixes: bdbc90f ("f2fs: don't put dentry page in pagecache into highmem") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Light Hsieh <light.hsieh@mediatek.com> Tested-by: Light Hsieh <light.hsieh@mediatek.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

[ Upstream commit dbb4cfea6efe979ed153bd59a6a527a90d3d0ab3 ] The interrupt handling should be related to the firmware version. If the driver matches an old firmware, then the driver should not handle interrupt such as i2c or dma, otherwise it will cause some errors. This log reveals it: [ 27.708641] INFO: trying to register non-static key. [ 27.710851] The code is fine but needs lockdep annotation, or maybe [ 27.712010] you didn't initialize this object before use? [ 27.712396] turning off the locking correctness validator. [ 27.712787] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.12.4-g70e7f0549188-dirty #169 [ 27.713349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 27.714149] Call Trace: [ 27.714329] <IRQ> [ 27.714480] dump_stack+0xba/0xf5 [ 27.714737] register_lock_class+0x873/0x8f0 [ 27.715052] ? __lock_acquire+0x323/0x1930 [ 27.715353] __lock_acquire+0x75/0x1930 [ 27.715636] lock_acquire+0x1dd/0x3e0 [ 27.715905] ? netup_i2c_interrupt+0x19/0x310 [ 27.716226] _raw_spin_lock_irqsave+0x4b/0x60 [ 27.716544] ? netup_i2c_interrupt+0x19/0x310 [ 27.716863] netup_i2c_interrupt+0x19/0x310 [ 27.717178] netup_unidvb_isr+0xd3/0x160 [ 27.717467] __handle_irq_event_percpu+0x53/0x3e0 [ 27.717808] handle_irq_event_percpu+0x35/0x90 [ 27.718129] handle_irq_event+0x39/0x60 [ 27.718409] handle_fasteoi_irq+0xc2/0x1d0 [ 27.718707] __common_interrupt+0x7f/0x150 [ 27.719008] common_interrupt+0xb4/0xd0 [ 27.719289] </IRQ> [ 27.719446] asm_common_interrupt+0x1e/0x40 [ 27.719747] RIP: 0010:native_safe_halt+0x17/0x20 [ 27.720084] Code: 07 0f 00 2d 8b ee 4c 00 f4 5d c3 0f 1f 84 00 00 00 00 00 8b 05 72 95 17 02 55 48 89 e5 85 c0 7e 07 0f 00 2d 6b ee 4c 00 fb f4 <5d> c3 cc cc cc cc cc cc cc 55 48 89 e5 e8 67 53 ff ff 8b 0d 29 f6 [ 27.721386] RSP: 0018:ffffc9000008fe90 EFLAGS: 00000246 [ 27.721758] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 [ 27.722262] RDX: 0000000000000000 RSI: ffffffff85f7c054 RDI: ffffffff85ded4e6 [ 27.722770] RBP: ffffc9000008fe90 R08: 0000000000000001 R09: 0000000000000001 [ 27.723277] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86a75408 [ 27.723781] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888100260000 [ 27.724289] default_idle+0x9/0x10 [ 27.724537] arch_cpu_idle+0xa/0x10 [ 27.724791] default_idle_call+0x6e/0x250 [ 27.725082] do_idle+0x1f0/0x2d0 [ 27.725326] cpu_startup_entry+0x18/0x20 [ 27.725613] start_secondary+0x11f/0x160 [ 27.725902] secondary_startup_64_no_verify+0xb0/0xbb [ 27.726272] BUG: kernel NULL pointer dereference, address: 0000000000000002 [ 27.726768] #PF: supervisor read access in kernel mode [ 27.727138] #PF: error_code(0x0000) - not-present page [ 27.727507] PGD 8000000118688067 P4D 8000000118688067 PUD 10feab067 PMD 0 [ 27.727999] Oops: 0000 [#1] PREEMPT SMP PTI [ 27.728302] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.12.4-g70e7f0549188-dirty #169 [ 27.728861] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 27.729660] RIP: 0010:netup_i2c_interrupt+0x23/0x310 [ 27.730019] Code: 0f 1f 80 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb e8 af 6e 95 fd 48 89 df e8 e7 9f 1c 01 49 89 c5 48 8b 83 48 08 00 00 <66> 44 8b 60 02 44 89 e0 48 8b 93 48 08 00 00 83 e0 f8 66 89 42 02 [ 27.731339] RSP: 0018:ffffc90000118e90 EFLAGS: 00010046 [ 27.731716] RAX: 0000000000000000 RBX: ffff88810803c4d8 RCX: 0000000000000000 [ 27.732223] RDX: 0000000000000001 RSI: ffffffff85d37b94 RDI: ffff88810803c4d8 [ 27.732727] RBP: ffffc90000118ea8 R08: 0000000000000000 R09: 0000000000000001 [ 27.733239] R10: ffff88810803c4f0 R11: 61646e6f63657320 R12: 0000000000000000 [ 27.733745] R13: 0000000000000046 R14: ffff888101041000 R15: ffff8881081b2400 [ 27.734251] FS: 0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000 [ 27.734821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.735228] CR2: 0000000000000002 CR3: 0000000108194000 CR4: 00000000000006e0 [ 27.735735] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 27.736241] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 27.736744] Call Trace: [ 27.736924] <IRQ> [ 27.737074] netup_unidvb_isr+0xd3/0x160 [ 27.737363] __handle_irq_event_percpu+0x53/0x3e0 [ 27.737706] handle_irq_event_percpu+0x35/0x90 [ 27.738028] handle_irq_event+0x39/0x60 [ 27.738306] handle_fasteoi_irq+0xc2/0x1d0 [ 27.738602] __common_interrupt+0x7f/0x150 [ 27.738899] common_interrupt+0xb4/0xd0 [ 27.739176] </IRQ> [ 27.739331] asm_common_interrupt+0x1e/0x40 [ 27.739633] RIP: 0010:native_safe_halt+0x17/0x20 [ 27.739967] Code: 07 0f 00 2d 8b ee 4c 00 f4 5d c3 0f 1f 84 00 00 00 00 00 8b 05 72 95 17 02 55 48 89 e5 85 c0 7e 07 0f 00 2d 6b ee 4c 00 fb f4 <5d> c3 cc cc cc cc cc cc cc 55 48 89 e5 e8 67 53 ff ff 8b 0d 29 f6 [ 27.741275] RSP: 0018:ffffc9000008fe90 EFLAGS: 00000246 [ 27.741647] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 [ 27.742148] RDX: 0000000000000000 RSI: ffffffff85f7c054 RDI: ffffffff85ded4e6 [ 27.742652] RBP: ffffc9000008fe90 R08: 0000000000000001 R09: 0000000000000001 [ 27.743154] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86a75408 [ 27.743652] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888100260000 [ 27.744157] default_idle+0x9/0x10 [ 27.744405] arch_cpu_idle+0xa/0x10 [ 27.744658] default_idle_call+0x6e/0x250 [ 27.744948] do_idle+0x1f0/0x2d0 [ 27.745190] cpu_startup_entry+0x18/0x20 [ 27.745475] start_secondary+0x11f/0x160 [ 27.745761] secondary_startup_64_no_verify+0xb0/0xbb [ 27.746123] Modules linked in: [ 27.746348] Dumping ftrace buffer: [ 27.746596] (ftrace buffer empty) [ 27.746852] CR2: 0000000000000002 [ 27.747094] ---[ end trace ebafd46f83ab946d ]--- [ 27.747424] RIP: 0010:netup_i2c_interrupt+0x23/0x310 [ 27.747778] Code: 0f 1f 80 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb e8 af 6e 95 fd 48 89 df e8 e7 9f 1c 01 49 89 c5 48 8b 83 48 08 00 00 <66> 44 8b 60 02 44 89 e0 48 8b 93 48 08 00 00 83 e0 f8 66 89 42 02 [ 27.749082] RSP: 0018:ffffc90000118e90 EFLAGS: 00010046 [ 27.749461] RAX: 0000000000000000 RBX: ffff88810803c4d8 RCX: 0000000000000000 [ 27.749966] RDX: 0000000000000001 RSI: ffffffff85d37b94 RDI: ffff88810803c4d8 [ 27.750471] RBP: ffffc90000118ea8 R08: 0000000000000000 R09: 0000000000000001 [ 27.750976] R10: ffff88810803c4f0 R11: 61646e6f63657320 R12: 0000000000000000 [ 27.751480] R13: 0000000000000046 R14: ffff888101041000 R15: ffff8881081b2400 [ 27.751986] FS: 0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000 [ 27.752560] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.752970] CR2: 0000000000000002 CR3: 0000000108194000 CR4: 00000000000006e0 [ 27.753481] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 27.753984] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 27.754487] Kernel panic - not syncing: Fatal exception in interrupt [ 27.755033] Dumping ftrace buffer: [ 27.755279] (ftrace buffer empty) [ 27.755534] Kernel Offset: disabled [ 27.755785] Rebooting in 1 seconds.. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 39fbef4b0f77f9c89c8f014749ca533643a37c9f ] The following kernel crash can be triggered: [ 89.266592] ------------[ cut here ]------------ [ 89.267427] kernel BUG at fs/buffer.c:3020! [ 89.268264] invalid opcode: 0000 [#1] SMP KASAN PTI [ 89.269116] CPU: 7 PID: 1750 Comm: kmmpd-loop0 Not tainted 5.10.0-862.14.0.6.x86_64-08610-gc932cda3cef4-dirty #20 [ 89.273169] RIP: 0010:submit_bh_wbc.isra.0+0x538/0x6d0 [ 89.277157] RSP: 0018:ffff888105ddfd08 EFLAGS: 00010246 [ 89.278093] RAX: 0000000000000005 RBX: ffff888124231498 RCX: ffffffffb2772612 [ 89.279332] RDX: 1ffff11024846293 RSI: 0000000000000008 RDI: ffff888124231498 [ 89.280591] RBP: ffff8881248cc000 R08: 0000000000000001 R09: ffffed1024846294 [ 89.281851] R10: ffff88812423149f R11: ffffed1024846293 R12: 0000000000003800 [ 89.283095] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8881161f7000 [ 89.284342] FS: 0000000000000000(0000) GS:ffff88839b5c0000(0000) knlGS:0000000000000000 [ 89.285711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 89.286701] CR2: 00007f166ebc01a0 CR3: 0000000435c0e000 CR4: 00000000000006e0 [ 89.287919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 89.289138] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 89.290368] Call Trace: [ 89.290842] write_mmp_block+0x2ca/0x510 [ 89.292218] kmmpd+0x433/0x9a0 [ 89.294902] kthread+0x2dd/0x3e0 [ 89.296268] ret_from_fork+0x22/0x30 [ 89.296906] Modules linked in: by running the following commands: 1. mkfs.ext4 -O mmp /dev/sda -b 1024 2. mount /dev/sda /home/test 3. echo "/dev/sda" > /sys/power/resume That happens because swsusp_check() calls set_blocksize() on the target partition which confuses the file system: Thread1 Thread2 mount /dev/sda /home/test get s_mmp_bh --> has mapped flag start kmmpd thread echo "/dev/sda" > /sys/power/resume resume_store software_resume swsusp_check set_blocksize truncate_inode_pages_range truncate_cleanup_page block_invalidatepage discard_buffer --> clean mapped flag write_mmp_block submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)) To address this issue, modify swsusp_check() to open the target block device with exclusive access. Signed-off-by: Ye Bin <yebin10@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit c98c5daaa24b583cba1369b7d167f93c6ae7299c ] Current code does list element deletion and addition in and out of lock protection. This patch moves deletion behind lock. list_add double add: new=ffff9130b5eb89f8, prev=ffff9130b5eb89f8, next=ffff9130c6a715f0. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:31! invalid opcode: 0000 [#1] SMP PTI CPU: 1 PID: 182395 Comm: kworker/1:37 Kdump: loaded Tainted: G W OE --------- - - 4.18.0-193.el8.x86_64 #1 Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx] RIP: 0010:__list_add_valid+0x41/0x50 Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 60 83 ad 97 e8 4d bd ce ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 8b 57 08 RSP: 0018:ffffaba306f47d68 EFLAGS: 00010046 RAX: 0000000000000058 RBX: ffff9130b5eb8800 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff9130b7456a00 RBP: ffff9130c6a70a58 R08: 000000000008d7be R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffff9130c6a715f0 R13: ffff9130b5eb8824 R14: ffff9130b5eb89f8 R15: ffff9130b5eb89f8 FS: 0000000000000000(0000) GS:ffff9130b7440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efcaaef11a0 CR3: 000000005200a002 CR4: 00000000000606e0 Call Trace: qla24xx_async_gnl+0x113/0x3c0 [qla2xxx] ? qla2x00_iocb_work_fn+0x53/0x80 [qla2xxx] ? process_one_work+0x1a7/0x3b0 ? worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 ? kthread+0x112/0x130 Link: https://lore.kernel.org/r/20211026115412.27691-3-njavali@marvell.com Fixes: 726b854 ("qla2xxx: Add framework for async fabric discovery") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 92d602bc7177325e7453189a22e0c8764ed3453e upstream. We use inline_dentry which requires to allocate dentry page when adding a link. If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem) twice by f2fs_lock_op(). I think this should be okay, but how about stopping the lockdep complaint [1]? f2fs_create() - f2fs_lock_op() - f2fs_do_add_link() - __f2fs_find_entry - f2fs_get_read_data_page() -> kswapd - shrink_node - f2fs_evict_inode - f2fs_lock_op() [1] fs_reclaim ){+.+.}-{0:0} : kswapd0: lock_acquire+0x114/0x394 kswapd0: __fs_reclaim_acquire+0x40/0x50 kswapd0: prepare_alloc_pages+0x94/0x1ec kswapd0: __alloc_pages_nodemask+0x78/0x1b0 kswapd0: pagecache_get_page+0x2e0/0x57c kswapd0: f2fs_get_read_data_page+0xc0/0x394 kswapd0: f2fs_find_data_page+0xa4/0x23c kswapd0: find_in_level+0x1a8/0x36c kswapd0: __f2fs_find_entry+0x70/0x100 kswapd0: f2fs_do_add_link+0x84/0x1ec kswapd0: f2fs_mkdir+0xe4/0x1e4 kswapd0: vfs_mkdir+0x110/0x1c0 kswapd0: do_mkdirat+0xa4/0x160 kswapd0: __arm64_sys_mkdirat+0x24/0x34 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 kswapd0: -> #1 ( &sbi->cp_rwsem ){++++}-{3:3} : kswapd0: lock_acquire+0x114/0x394 kswapd0: down_read+0x7c/0x98 kswapd0: f2fs_do_truncate_blocks+0x78/0x3dc kswapd0: f2fs_truncate+0xc8/0x128 kswapd0: f2fs_evict_inode+0x2b8/0x8b8 kswapd0: evict+0xd4/0x2f8 kswapd0: iput+0x1c0/0x258 kswapd0: do_unlinkat+0x170/0x2a0 kswapd0: __arm64_sys_unlinkat+0x4c/0x68 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 Cc: stable@vger.kernel.org Fixes: bdbc90f ("f2fs: don't put dentry page in pagecache into highmem") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Light Hsieh <light.hsieh@mediatek.com> Tested-by: Light Hsieh <light.hsieh@mediatek.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…fails commit daf972118c517b91f74ff1731417feb4270625a4 upstream. Check for a valid hv_vp_index array prior to derefencing hv_vp_index when setting Hyper-V's TSC change callback. If Hyper-V setup failed in hyperv_init(), the kernel will still report that it's running under Hyper-V, but will have silently disabled nearly all functionality. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:set_hv_tscchange_cb+0x15/0xa0 Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08 ... Call Trace: kvm_arch_init+0x17c/0x280 kvm_init+0x31/0x330 vmx_init+0xba/0x13a do_one_initcall+0x41/0x1c0 kernel_init_freeable+0x1f2/0x23b kernel_init+0x16/0x120 ret_from_fork+0x22/0x30 Fixes: 9328626 ("x86/hyperv: Reenlightenment notifications support") Cc: stable@vger.kernel.org Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit f8bbc07ac535593139c875ffa19af924b1084540 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream. Running a lot of VK CTS in parallel against nouveau, once every few hours you might see something like this crash. BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27 Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021 RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1 RSP: 0000:ffffac20c5857838 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001 RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180 RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10 R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ... ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau] nvkm_vmm_iter+0x351/0xa20 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __lock_acquire+0x3ed/0x2170 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_map_locked+0x224/0x3a0 [nouveau] Adding any sort of useful debug usually makes it go away, so I hand wrote the function in a line, and debugged the asm. Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in the nv50_instobj_acquire called from nvkm_kmap. If Thread A and Thread B both get to nv50_instobj_acquire around the same time, and Thread A hits the refcount_set line, and in lockstep thread B succeeds at refcount_inc_not_zero, there is a chance the ptrs value won't have been stored since refcount_set is unordered. Force a memory barrier here, I picked smp_mb, since we want it on all CPUs and it's write followed by a read. v2: use paired smp_rmb/smp_wmb. Cc: <stable@vger.kernel.org> Fixes: be55287 ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj") Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 2cc7d150550cc981aceedf008f5459193282425c ] Issue reported by customer during SRIOV testing, call trace: When both i40e and the i40iw driver are loaded, a warning in check_flush_dependency is being triggered. This seems to be because of the i40e driver workqueue is allocated with the WQ_MEM_RECLAIM flag, and the i40iw one is not. Similar error was encountered on ice too and it was fixed by removing the flag. Do the same for i40e too. [Feb 9 09:08] ------------[ cut here ]------------ [ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966 check_flush_dependency+0x10b/0x120 [ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1 [ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 [ +0.000001] Workqueue: i40e i40e_service_task [i40e] [ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120 [ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90 [ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282 [ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX: 0000000000000027 [ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI: ffff94d47f620bc0 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff [ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12: ffff94c5451ea180 [ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15: ffff94c5f1330ab0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4: 00000000007706f0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] ? __warn+0x80/0x130 [ +0.000003] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? report_bug+0x195/0x1a0 [ +0.000005] ? handle_bug+0x3c/0x70 [ +0.000003] ? exc_invalid_op+0x14/0x70 [ +0.000002] ? asm_exc_invalid_op+0x16/0x20 [ +0.000006] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? check_flush_dependency+0x10b/0x120 [ +0.000002] __flush_workqueue+0x126/0x3f0 [ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core] [ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core] [ +0.000020] i40iw_close+0x4b/0x90 [irdma] [ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e] [ +0.000035] i40e_service_task+0x126/0x190 [i40e] [ +0.000024] process_one_work+0x174/0x340 [ +0.000003] worker_thread+0x27e/0x390 [ +0.000001] ? __pfx_worker_thread+0x10/0x10 [ +0.000002] kthread+0xdf/0x110 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork+0x2d/0x50 [ +0.000003] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork_asm+0x1b/0x30 [ +0.000004] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]--- Fixes: 4d5957c ("i40e: remove WQ_UNBOUND and the task limit of our workqueue") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Robert Ganzynkowicz <robert.ganzynkowicz@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 54c4ec5f8c471b7c1137a1f769648549c423c026 ] The uart_handle_cts_change() function in serial_core expects the caller to hold uport->lock. For example, I have seen the below kernel splat, when the Bluetooth driver is loaded on an i.MX28 board. [ 85.119255] ------------[ cut here ]------------ [ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec [ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs [ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1 [ 85.151396] Hardware name: Freescale MXS (Device Tree) [ 85.156679] Workqueue: hci0 hci_power_on [bluetooth] (...) [ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4 [ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210 (...) Cc: stable@vger.kernel.org Fixes: 4d90bb1 ("serial: core: Document and assert lock requirements for irq helpers") Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Emil Kronborg <emil.kronborg@protonmail.com> Link: https://lore.kernel.org/r/20240320121530.11348-1-emil.kronborg@protonmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 6fe60465e1d53ea321ee909be26d97529e8f746c upstream. If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ #49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com Fixes: cd11016 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Xiubo Li <xiubli@redhat.com> Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/ Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/ Suggested-by: Dave Chinner <david@fromorbit.com> Tested-by: Xiubo Li <xiubli@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit c214ed2a4dda35b308b0b28eed804d7ae66401f9 ] The session resources are used by FW and driver when session is offloaded, once session is uploaded these resources are not used. The lock is not required as these fields won't be used any longer. The offload and upload calls are sequential, hence lock is not required. This will suppress following BUG_ON(): [ 449.843143] ------------[ cut here ]------------ [ 449.848302] kernel BUG at mm/vmalloc.c:2727! [ 449.853072] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 449.858712] CPU: 5 PID: 1996 Comm: kworker/u24:2 Not tainted 5.14.0-118.el9.x86_64 #1 Rebooting. [ 449.867454] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4 11/08/2016 [ 449.876966] Workqueue: fc_rport_eq fc_rport_work [libfc] [ 449.882910] RIP: 0010:vunmap+0x2e/0x30 [ 449.887098] Code: 00 65 8b 05 14 a2 f0 4a a9 00 ff ff 00 75 1b 55 48 89 fd e8 34 36 79 00 48 85 ed 74 0b 48 89 ef 31 f6 5d e9 14 fc ff ff 5d c3 <0f> 0b 0f 1f 44 00 00 41 57 41 56 49 89 ce 41 55 49 89 fd 41 54 41 [ 449.908054] RSP: 0018:ffffb83d878b3d68 EFLAGS: 00010206 [ 449.913887] RAX: 0000000080000201 RBX: ffff8f4355133550 RCX: 000000000d400005 [ 449.921843] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffb83da53f5000 [ 449.929808] RBP: ffff8f4ac6675800 R08: ffffb83d878b3d30 R09: 00000000000efbdf [ 449.937774] R10: 0000000000000003 R11: ffff8f434573e000 R12: 0000000000001000 [ 449.945736] R13: 0000000000001000 R14: ffffb83da53f5000 R15: ffff8f43d4ea3ae0 [ 449.953701] FS: 0000000000000000(0000) GS:ffff8f529fc80000(0000) knlGS:0000000000000000 [ 449.962732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 449.969138] CR2: 00007f8cf993e150 CR3: 0000000efbe10003 CR4: 00000000003706e0 [ 449.977102] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 449.985065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 449.993028] Call Trace: [ 449.995756] __iommu_dma_free+0x96/0x100 [ 450.000139] bnx2fc_free_session_resc+0x67/0x240 [bnx2fc] [ 450.006171] bnx2fc_upload_session+0xce/0x100 [bnx2fc] [ 450.011910] bnx2fc_rport_event_handler+0x9f/0x240 [bnx2fc] [ 450.018136] fc_rport_work+0x103/0x5b0 [libfc] [ 450.023103] process_one_work+0x1e8/0x3c0 [ 450.027581] worker_thread+0x50/0x3b0 [ 450.031669] ? rescuer_thread+0x370/0x370 [ 450.036143] kthread+0x149/0x170 [ 450.039744] ? set_kthread_struct+0x40/0x40 [ 450.044411] ret_from_fork+0x22/0x30 [ 450.048404] Modules linked in: vfat msdos fat xfs nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver dm_service_time qedf qed crc8 bnx2fc libfcoe libfc scsi_transport_fc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp dcdbas rapl intel_cstate intel_uncore mei_me pcspkr mei ipmi_ssif lpc_ich ipmi_si fuse zram ext4 mbcache jbd2 loop nfsv3 nfs_acl nfs lockd grace fscache netfs irdma ice sd_mod t10_pi sg ib_uverbs ib_core 8021q garp mrp stp llc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mxm_wmi fb_sys_fops cec crct10dif_pclmul ahci crc32_pclmul bnx2x drm ghash_clmulni_intel libahci rfkill i40e libata megaraid_sas mdio wmi sunrpc lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls [ 450.048497] libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi edd ipmi_devintf ipmi_msghandler [ 450.159753] ---[ end trace 712de2c57c64abc8 ]--- Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240315071427.31842-1-skashyap@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit e5f4e68eed85fa8495d78cd966eecc2b27bb9e53 ] When using --Summary mode, added MSRs in raw mode always print zeros. Print the actual register contents. Example, with patch: note the added column: --add msr0x64f,u32,package,raw,REASON Where: 0x64F is MSR_CORE_PERF_LIMIT_REASONS Busy% Bzy_MHz PkgTmp PkgWatt CorWatt REASON 0.00 4800 35 1.42 0.76 0x00000000 0.00 4801 34 1.42 0.76 0x00000000 80.08 4531 66 108.17 107.52 0x08000000 98.69 4530 66 133.21 132.54 0x08000000 99.28 4505 66 128.26 127.60 0x0c000400 99.65 4486 68 124.91 124.25 0x0c000400 99.63 4483 68 124.90 124.25 0x0c000400 79.34 4481 41 99.80 99.13 0x0c000000 0.00 4801 41 1.40 0.73 0x0c000000 Where, for the test processor (i5-10600K): PKG Limit #1: 125.000 Watts, 8.000000 sec MSR bit 26 = log; bit 10 = status PKG Limit #2: 136.000 Watts, 0.002441 sec MSR bit 27 = log; bit 11 = status Example, without patch: Busy% Bzy_MHz PkgTmp PkgWatt CorWatt REASON 0.01 4800 35 1.43 0.77 0x00000000 0.00 4801 35 1.39 0.73 0x00000000 83.49 4531 66 112.71 112.06 0x00000000 98.69 4530 68 133.35 132.69 0x00000000 99.31 4500 67 127.96 127.30 0x00000000 99.63 4483 69 124.91 124.25 0x00000000 99.61 4481 69 124.90 124.25 0x00000000 99.61 4481 71 124.92 124.25 0x00000000 59.35 4479 42 75.03 74.37 0x00000000 0.00 4800 42 1.39 0.73 0x00000000 0.00 4801 42 1.42 0.76 0x00000000 c000000 [lenb: simplified patch to apply only to package scope] Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 94062790aedb505bdda209b10bea47b294d6394f ] TCP_SYN_RECV state is really special, it is only used by cross-syn connections, mostly used by fuzzers. In the following crash [1], syzbot managed to trigger a divide by zero in tcp_rcv_space_adjust() A socket makes the following state transitions, without ever calling tcp_init_transfer(), meaning tcp_init_buffer_space() is also not called. TCP_CLOSE connect() TCP_SYN_SENT TCP_SYN_RECV shutdown() -> tcp_shutdown(sk, SEND_SHUTDOWN) TCP_FIN_WAIT1 To fix this issue, change tcp_shutdown() to not perform a TCP_SYN_RECV -> TCP_FIN_WAIT1 transition, which makes no sense anyway. When tcp_rcv_state_process() later changes socket state from TCP_SYN_RECV to TCP_ESTABLISH, then look at sk->sk_shutdown to finally enter TCP_FIN_WAIT1 state, and send a FIN packet from a sane socket state. This means tcp_send_fin() can now be called from BH context, and must use GFP_ATOMIC allocations. [1] divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 5084 Comm: syz-executor358 Not tainted 6.9.0-rc6-syzkaller-00022-g98369dccd2f8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:tcp_rcv_space_adjust+0x2df/0x890 net/ipv4/tcp_input.c:767 Code: e3 04 4c 01 eb 48 8b 44 24 38 0f b6 04 10 84 c0 49 89 d5 0f 85 a5 03 00 00 41 8b 8e c8 09 00 00 89 e8 29 c8 48 0f af c3 31 d2 <48> f7 f1 48 8d 1c 43 49 8d 96 76 08 00 00 48 89 d0 48 c1 e8 03 48 RSP: 0018:ffffc900031ef3f0 EFLAGS: 00010246 RAX: 0c677a10441f8f42 RBX: 000000004fb95e7e RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000027d4b11f R08: ffffffff89e535a4 R09: 1ffffffff25e6ab7 R10: dffffc0000000000 R11: ffffffff8135e920 R12: ffff88802a9f8d30 R13: dffffc0000000000 R14: ffff88802a9f8d00 R15: 1ffff1100553f2da FS: 00005555775c0380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1155bf2304 CR3: 000000002b9f2000 CR4: 0000000000350ef0 Call Trace: <TASK> tcp_recvmsg_locked+0x106d/0x25a0 net/ipv4/tcp.c:2513 tcp_recvmsg+0x25d/0x920 net/ipv4/tcp.c:2578 inet6_recvmsg+0x16a/0x730 net/ipv6/af_inet6.c:680 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x109/0x280 net/socket.c:1068 ____sys_recvmsg+0x1db/0x470 net/socket.c:2803 ___sys_recvmsg net/socket.c:2845 [inline] do_recvmmsg+0x474/0xae0 net/socket.c:2939 __sys_recvmmsg net/socket.c:3018 [inline] __do_sys_recvmmsg net/socket.c:3041 [inline] __se_sys_recvmmsg net/socket.c:3034 [inline] __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7faeb6363db9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcc1997168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faeb6363db9 RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001c R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20240501125448.896529-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f2db7230f73a80dbb179deab78f88a7947f0ab7e ] Anderson Nascimento reported a use-after-free splat in tcp_twsk_unique() with nice analysis. Since commit ec94c26 ("tcp/dccp: avoid one atomic operation for timewait hashdance"), inet_twsk_hashdance() sets TIME-WAIT socket's sk_refcnt after putting it into ehash and releasing the bucket lock. Thus, there is a small race window where other threads could try to reuse the port during connect() and call sock_hold() in tcp_twsk_unique() for the TIME-WAIT socket with zero refcnt. If that happens, the refcnt taken by tcp_twsk_unique() is overwritten and sock_put() will cause underflow, triggering a real use-after-free somewhere else. To avoid the use-after-free, we need to use refcount_inc_not_zero() in tcp_twsk_unique() and give up on reusing the port if it returns false. [0]: refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110 CPU: 0 PID: 1039313 Comm: trigger Not tainted 6.8.6-200.fc39.x86_64 #1 Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 RIP: 0010:refcount_warn_saturate+0xe5/0x110 Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8 RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX: 0000000000000027 RDX: ffff88807be218c8 RSI: 0000000000000001 RDI: ffff88807be218c0 RBP: 0000000000069d70 R08: 0000000000000000 R09: ffffc90006b439f0 R10: ffffc90006b439e8 R11: 0000000000000003 R12: ffff8880029ede84 R13: 0000000000004e20 R14: ffffffff84356dc0 R15: ffff888009bb3ef0 FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020ccb000 CR3: 000000004628c005 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <TASK> ? refcount_warn_saturate+0xe5/0x110 ? __warn+0x81/0x130 ? refcount_warn_saturate+0xe5/0x110 ? report_bug+0x171/0x1a0 ? refcount_warn_saturate+0xe5/0x110 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? refcount_warn_saturate+0xe5/0x110 tcp_twsk_unique+0x186/0x190 __inet_check_established+0x176/0x2d0 __inet_hash_connect+0x74/0x7d0 ? __pfx___inet_check_established+0x10/0x10 tcp_v4_connect+0x278/0x530 __inet_stream_connect+0x10f/0x3d0 inet_stream_connect+0x3a/0x60 __sys_connect+0xa8/0xd0 __x64_sys_connect+0x18/0x20 do_syscall_64+0x83/0x170 entry_SYSCALL_64_after_hwframe+0x78/0x80 RIP: 0033:0x7f62c11a885d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX: 00007f62c11a885d RDX: 0000000000000010 RSI: 0000000020ccb000 RDI: 0000000000000003 RBP: 00007f62c1091e90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 00007f62c10926c0 R13: ffffffffffffff88 R14: 0000000000000000 R15: 00007ffe237885b0 </TASK> Fixes: ec94c26 ("tcp/dccp: avoid one atomic operation for timewait hashdance") Reported-by: Anderson Nascimento <anderson@allelesecurity.com> Closes: https://lore.kernel.org/netdev/37a477a6-d39e-486b-9577-3463f655a6b7@allelesecurity.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240501213145.62261-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit adf0398cee86643b8eacde95f17d073d022f782c ] There is a race condition between l2cap_chan_timeout() and l2cap_chan_del(). When we use l2cap_chan_del() to delete the channel, the chan->conn will be set to null. But the conn could be dereferenced again in the mutex_lock() of l2cap_chan_timeout(). As a result the null pointer dereference bug will happen. The KASAN report triggered by POC is shown below: [ 472.074580] ================================================================== [ 472.075284] BUG: KASAN: null-ptr-deref in mutex_lock+0x68/0xc0 [ 472.075308] Write of size 8 at addr 0000000000000158 by task kworker/0:0/7 [ 472.075308] [ 472.075308] CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 6.9.0-rc5-00356-g78c0094a146b #36 [ 472.075308] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu4 [ 472.075308] Workqueue: events l2cap_chan_timeout [ 472.075308] Call Trace: [ 472.075308] <TASK> [ 472.075308] dump_stack_lvl+0x137/0x1a0 [ 472.075308] print_report+0x101/0x250 [ 472.075308] ? __virt_addr_valid+0x77/0x160 [ 472.075308] ? mutex_lock+0x68/0xc0 [ 472.075308] kasan_report+0x139/0x170 [ 472.075308] ? mutex_lock+0x68/0xc0 [ 472.075308] kasan_check_range+0x2c3/0x2e0 [ 472.075308] mutex_lock+0x68/0xc0 [ 472.075308] l2cap_chan_timeout+0x181/0x300 [ 472.075308] process_one_work+0x5d2/0xe00 [ 472.075308] worker_thread+0xe1d/0x1660 [ 472.075308] ? pr_cont_work+0x5e0/0x5e0 [ 472.075308] kthread+0x2b7/0x350 [ 472.075308] ? pr_cont_work+0x5e0/0x5e0 [ 472.075308] ? kthread_blkcg+0xd0/0xd0 [ 472.075308] ret_from_fork+0x4d/0x80 [ 472.075308] ? kthread_blkcg+0xd0/0xd0 [ 472.075308] ret_from_fork_asm+0x11/0x20 [ 472.075308] </TASK> [ 472.075308] ================================================================== [ 472.094860] Disabling lock debugging due to kernel taint [ 472.096136] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 472.096136] #PF: supervisor write access in kernel mode [ 472.096136] #PF: error_code(0x0002) - not-present page [ 472.096136] PGD 0 P4D 0 [ 472.096136] Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI [ 472.096136] CPU: 0 PID: 7 Comm: kworker/0:0 Tainted: G B 6.9.0-rc5-00356-g78c0094a146b #36 [ 472.096136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu4 [ 472.096136] Workqueue: events l2cap_chan_timeout [ 472.096136] RIP: 0010:mutex_lock+0x88/0xc0 [ 472.096136] Code: be 08 00 00 00 e8 f8 23 1f fd 4c 89 f7 be 08 00 00 00 e8 eb 23 1f fd 42 80 3c 23 00 74 08 48 88 [ 472.096136] RSP: 0018:ffff88800744fc78 EFLAGS: 00000246 [ 472.096136] RAX: 0000000000000000 RBX: 1ffff11000e89f8f RCX: ffffffff8457c865 [ 472.096136] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88800744fc78 [ 472.096136] RBP: 0000000000000158 R08: ffff88800744fc7f R09: 1ffff11000e89f8f [ 472.096136] R10: dffffc0000000000 R11: ffffed1000e89f90 R12: dffffc0000000000 [ 472.096136] R13: 0000000000000158 R14: ffff88800744fc78 R15: ffff888007405a00 [ 472.096136] FS: 0000000000000000(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 [ 472.096136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 472.096136] CR2: 0000000000000158 CR3: 000000000da32000 CR4: 00000000000006f0 [ 472.096136] Call Trace: [ 472.096136] <TASK> [ 472.096136] ? __die_body+0x8d/0xe0 [ 472.096136] ? page_fault_oops+0x6b8/0x9a0 [ 472.096136] ? kernelmode_fixup_or_oops+0x20c/0x2a0 [ 472.096136] ? do_user_addr_fault+0x1027/0x1340 [ 472.096136] ? _printk+0x7a/0xa0 [ 472.096136] ? mutex_lock+0x68/0xc0 [ 472.096136] ? add_taint+0x42/0xd0 [ 472.096136] ? exc_page_fault+0x6a/0x1b0 [ 472.096136] ? asm_exc_page_fault+0x26/0x30 [ 472.096136] ? mutex_lock+0x75/0xc0 [ 472.096136] ? mutex_lock+0x88/0xc0 [ 472.096136] ? mutex_lock+0x75/0xc0 [ 472.096136] l2cap_chan_timeout+0x181/0x300 [ 472.096136] process_one_work+0x5d2/0xe00 [ 472.096136] worker_thread+0xe1d/0x1660 [ 472.096136] ? pr_cont_work+0x5e0/0x5e0 [ 472.096136] kthread+0x2b7/0x350 [ 472.096136] ? pr_cont_work+0x5e0/0x5e0 [ 472.096136] ? kthread_blkcg+0xd0/0xd0 [ 472.096136] ret_from_fork+0x4d/0x80 [ 472.096136] ? kthread_blkcg+0xd0/0xd0 [ 472.096136] ret_from_fork_asm+0x11/0x20 [ 472.096136] </TASK> [ 472.096136] Modules linked in: [ 472.096136] CR2: 0000000000000158 [ 472.096136] ---[ end trace 0000000000000000 ]--- [ 472.096136] RIP: 0010:mutex_lock+0x88/0xc0 [ 472.096136] Code: be 08 00 00 00 e8 f8 23 1f fd 4c 89 f7 be 08 00 00 00 e8 eb 23 1f fd 42 80 3c 23 00 74 08 48 88 [ 472.096136] RSP: 0018:ffff88800744fc78 EFLAGS: 00000246 [ 472.096136] RAX: 0000000000000000 RBX: 1ffff11000e89f8f RCX: ffffffff8457c865 [ 472.096136] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88800744fc78 [ 472.096136] RBP: 0000000000000158 R08: ffff88800744fc7f R09: 1ffff11000e89f8f [ 472.132932] R10: dffffc0000000000 R11: ffffed1000e89f90 R12: dffffc0000000000 [ 472.132932] R13: 0000000000000158 R14: ffff88800744fc78 R15: ffff888007405a00 [ 472.132932] FS: 0000000000000000(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 [ 472.132932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 472.132932] CR2: 0000000000000158 CR3: 000000000da32000 CR4: 00000000000006f0 [ 472.132932] Kernel panic - not syncing: Fatal exception [ 472.132932] Kernel Offset: disabled [ 472.132932] ---[ end Kernel panic - not syncing: Fatal exception ]--- Add a check to judge whether the conn is null in l2cap_chan_timeout() in order to mitigate the bug. Fixes: 3df91ea ("Bluetooth: Revert to mutexes from RCU list") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit d101291b2681e5ab938554e3e323f7a7ee33e3aa ] syzbot is able to trigger the following crash [1], caused by unsafe ip6_dst_idev() use. Indeed ip6_dst_idev() can return NULL, and must always be checked. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 31648 Comm: syz-executor.0 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:__fib6_rule_action net/ipv6/fib6_rules.c:237 [inline] RIP: 0010:fib6_rule_action+0x241/0x7b0 net/ipv6/fib6_rules.c:267 Code: 02 00 00 49 8d 9f d8 00 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 f9 32 bf f7 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 20 00 74 08 48 89 df e8 e0 32 bf f7 4c 8b 03 48 89 ef 4c RSP: 0018:ffffc9000fc1f2f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1a772f98c8186700 RDX: 0000000000000003 RSI: ffffffff8bcac4e0 RDI: ffffffff8c1f9760 RBP: ffff8880673fb980 R08: ffffffff8fac15ef R09: 1ffffffff1f582bd R10: dffffc0000000000 R11: fffffbfff1f582be R12: dffffc0000000000 R13: 0000000000000080 R14: ffff888076509000 R15: ffff88807a029a00 FS: 00007f55e82ca6c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31d23000 CR3: 0000000022b66000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> fib_rules_lookup+0x62c/0xdb0 net/core/fib_rules.c:317 fib6_rule_lookup+0x1fd/0x790 net/ipv6/fib6_rules.c:108 ip6_route_output_flags_noref net/ipv6/route.c:2637 [inline] ip6_route_output_flags+0x38e/0x610 net/ipv6/route.c:2649 ip6_route_output include/net/ip6_route.h:93 [inline] ip6_dst_lookup_tail+0x189/0x11a0 net/ipv6/ip6_output.c:1120 ip6_dst_lookup_flow+0xb9/0x180 net/ipv6/ip6_output.c:1250 sctp_v6_get_dst+0x792/0x1e20 net/sctp/ipv6.c:326 sctp_transport_route+0x12c/0x2e0 net/sctp/transport.c:455 sctp_assoc_add_peer+0x614/0x15c0 net/sctp/associola.c:662 sctp_connect_new_asoc+0x31d/0x6c0 net/sctp/socket.c:1099 __sctp_connect+0x66d/0xe30 net/sctp/socket.c:1197 sctp_connect net/sctp/socket.c:4819 [inline] sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 5e5f3f0 ("[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240507163145.835254-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 080cbb890286cd794f1ee788bbc5463e2deb7c2b upstream. Sam Page (sam4k) working with Trend Micro Zero Day Initiative reported a UAF in the tipc_buf_append() error path: BUG: KASAN: slab-use-after-free in kfree_skb_list_reason+0x47e/0x4c0 linux/net/core/skbuff.c:1183 Read of size 8 at addr ffff88804d2a7c80 by task poc/8034 CPU: 1 PID: 8034 Comm: poc Not tainted 6.8.2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 Call Trace: <IRQ> __dump_stack linux/lib/dump_stack.c:88 dump_stack_lvl+0xd9/0x1b0 linux/lib/dump_stack.c:106 print_address_description linux/mm/kasan/report.c:377 print_report+0xc4/0x620 linux/mm/kasan/report.c:488 kasan_report+0xda/0x110 linux/mm/kasan/report.c:601 kfree_skb_list_reason+0x47e/0x4c0 linux/net/core/skbuff.c:1183 skb_release_data+0x5af/0x880 linux/net/core/skbuff.c:1026 skb_release_all linux/net/core/skbuff.c:1094 __kfree_skb linux/net/core/skbuff.c:1108 kfree_skb_reason+0x12d/0x210 linux/net/core/skbuff.c:1144 kfree_skb linux/./include/linux/skbuff.h:1244 tipc_buf_append+0x425/0xb50 linux/net/tipc/msg.c:186 tipc_link_input+0x224/0x7c0 linux/net/tipc/link.c:1324 tipc_link_rcv+0x76e/0x2d70 linux/net/tipc/link.c:1824 tipc_rcv+0x45f/0x10f0 linux/net/tipc/node.c:2159 tipc_udp_recv+0x73b/0x8f0 linux/net/tipc/udp_media.c:390 udp_queue_rcv_one_skb+0xad2/0x1850 linux/net/ipv4/udp.c:2108 udp_queue_rcv_skb+0x131/0xb00 linux/net/ipv4/udp.c:2186 udp_unicast_rcv_skb+0x165/0x3b0 linux/net/ipv4/udp.c:2346 __udp4_lib_rcv+0x2594/0x3400 linux/net/ipv4/udp.c:2422 ip_protocol_deliver_rcu+0x30c/0x4e0 linux/net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2e4/0x520 linux/net/ipv4/ip_input.c:233 NF_HOOK linux/./include/linux/netfilter.h:314 NF_HOOK linux/./include/linux/netfilter.h:308 ip_local_deliver+0x18e/0x1f0 linux/net/ipv4/ip_input.c:254 dst_input linux/./include/net/dst.h:461 ip_rcv_finish linux/net/ipv4/ip_input.c:449 NF_HOOK linux/./include/linux/netfilter.h:314 NF_HOOK linux/./include/linux/netfilter.h:308 ip_rcv+0x2c5/0x5d0 linux/net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0x199/0x1e0 linux/net/core/dev.c:5534 __netif_receive_skb+0x1f/0x1c0 linux/net/core/dev.c:5648 process_backlog+0x101/0x6b0 linux/net/core/dev.c:5976 __napi_poll.constprop.0+0xba/0x550 linux/net/core/dev.c:6576 napi_poll linux/net/core/dev.c:6645 net_rx_action+0x95a/0xe90 linux/net/core/dev.c:6781 __do_softirq+0x21f/0x8e7 linux/kernel/softirq.c:553 do_softirq linux/kernel/softirq.c:454 do_softirq+0xb2/0xf0 linux/kernel/softirq.c:441 </IRQ> <TASK> __local_bh_enable_ip+0x100/0x120 linux/kernel/softirq.c:381 local_bh_enable linux/./include/linux/bottom_half.h:33 rcu_read_unlock_bh linux/./include/linux/rcupdate.h:851 __dev_queue_xmit+0x871/0x3ee0 linux/net/core/dev.c:4378 dev_queue_xmit linux/./include/linux/netdevice.h:3169 neigh_hh_output linux/./include/net/neighbour.h:526 neigh_output linux/./include/net/neighbour.h:540 ip_finish_output2+0x169f/0x2550 linux/net/ipv4/ip_output.c:235 __ip_finish_output linux/net/ipv4/ip_output.c:313 __ip_finish_output+0x49e/0x950 linux/net/ipv4/ip_output.c:295 ip_finish_output+0x31/0x310 linux/net/ipv4/ip_output.c:323 NF_HOOK_COND linux/./include/linux/netfilter.h:303 ip_output+0x13b/0x2a0 linux/net/ipv4/ip_output.c:433 dst_output linux/./include/net/dst.h:451 ip_local_out linux/net/ipv4/ip_output.c:129 ip_send_skb+0x3e5/0x560 linux/net/ipv4/ip_output.c:1492 udp_send_skb+0x73f/0x1530 linux/net/ipv4/udp.c:963 udp_sendmsg+0x1a36/0x2b40 linux/net/ipv4/udp.c:1250 inet_sendmsg+0x105/0x140 linux/net/ipv4/af_inet.c:850 sock_sendmsg_nosec linux/net/socket.c:730 __sock_sendmsg linux/net/socket.c:745 __sys_sendto+0x42c/0x4e0 linux/net/socket.c:2191 __do_sys_sendto linux/net/socket.c:2203 __se_sys_sendto linux/net/socket.c:2199 __x64_sys_sendto+0xe0/0x1c0 linux/net/socket.c:2199 do_syscall_x64 linux/arch/x86/entry/common.c:52 do_syscall_64+0xd8/0x270 linux/arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 linux/arch/x86/entry/entry_64.S:120 RIP: 0033:0x7f3434974f29 Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 37 8f 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007fff9154f2b8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3434974f29 RDX: 00000000000032c8 RSI: 00007fff9154f300 RDI: 0000000000000003 RBP: 00007fff915532e0 R08: 00007fff91553360 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000212 R12: 000055ed86d261d0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> In the critical scenario, either the relevant skb is freed or its ownership is transferred into a frag_lists. In both cases, the cleanup code must not free it again: we need to clear the skb reference earlier. Fixes: 1149557 ("tipc: eliminate unnecessary linearization of incoming buffers") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-23852 Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/752f1ccf762223d109845365d07f55414058e5a3.1714484273.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…nix_gc(). commit 1971d13ffa84a551d29a81fdf5b5ec5be166ac83 upstream. syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock(). One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket. So, the splat is false-positive. Let's add a dedicated lock class for the latter to suppress the splat. Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl. [0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock); *** DEADLOCK *** 3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 85a6a1aff08ec9f5b929d345d066e2830e8818e5 ] The 'TAG 66 Packet Format' description is missing the cipher code and checksum fields that are packed into the message packet. As a result, the buffer allocated for the packet is 3 bytes too small and write_tag_66_packet() will write up to 3 bytes past the end of the buffer. Fix this by increasing the size of the allocation so the whole packet will always fit in the buffer. This fixes the below kasan slab-out-of-bounds bug: BUG: KASAN: slab-out-of-bounds in ecryptfs_generate_key_packet_set+0x7d6/0xde0 Write of size 1 at addr ffff88800afbb2a5 by task touch/181 CPU: 0 PID: 181 Comm: touch Not tainted 6.6.13-gnu #1 4c9534092be820851bb687b82d1f92a426598dc6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2/GNU Guix 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4c/0x70 print_report+0xc5/0x610 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 ? kasan_complete_mode_report_info+0x44/0x210 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 kasan_report+0xc2/0x110 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 __asan_store1+0x62/0x80 ecryptfs_generate_key_packet_set+0x7d6/0xde0 ? __pfx_ecryptfs_generate_key_packet_set+0x10/0x10 ? __alloc_pages+0x2e2/0x540 ? __pfx_ovl_open+0x10/0x10 [overlay 30837f11141636a8e1793533a02e6e2e885dad1d] ? dentry_open+0x8f/0xd0 ecryptfs_write_metadata+0x30a/0x550 ? __pfx_ecryptfs_write_metadata+0x10/0x10 ? ecryptfs_get_lower_file+0x6b/0x190 ecryptfs_initialize_file+0x77/0x150 ecryptfs_create+0x1c2/0x2f0 path_openat+0x17cf/0x1ba0 ? __pfx_path_openat+0x10/0x10 do_filp_open+0x15e/0x290 ? __pfx_do_filp_open+0x10/0x10 ? __kasan_check_write+0x18/0x30 ? _raw_spin_lock+0x86/0xf0 ? __pfx__raw_spin_lock+0x10/0x10 ? __kasan_check_write+0x18/0x30 ? alloc_fd+0xf4/0x330 do_sys_openat2+0x122/0x160 ? __pfx_do_sys_openat2+0x10/0x10 __x64_sys_openat+0xef/0x170 ? __pfx___x64_sys_openat+0x10/0x10 do_syscall_64+0x60/0xd0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f00a703fd67 Code: 25 00 00 41 00 3d 00 00 41 00 74 37 64 8b 04 25 18 00 00 00 85 c0 75 5b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 85 00 00 00 48 83 c4 68 5d 41 5c c3 0f 1f RSP: 002b:00007ffc088e30b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 00007ffc088e3368 RCX: 00007f00a703fd67 RDX: 0000000000000941 RSI: 00007ffc088e48d7 RDI: 00000000ffffff9c RBP: 00007ffc088e48d7 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 0000000000000000 R14: 00007ffc088e48d7 R15: 00007f00a7180040 </TASK> Allocated by task 181: kasan_save_stack+0x2f/0x60 kasan_set_track+0x29/0x40 kasan_save_alloc_info+0x25/0x40 __kasan_kmalloc+0xc5/0xd0 __kmalloc+0x66/0x160 ecryptfs_generate_key_packet_set+0x6d2/0xde0 ecryptfs_write_metadata+0x30a/0x550 ecryptfs_initialize_file+0x77/0x150 ecryptfs_create+0x1c2/0x2f0 path_openat+0x17cf/0x1ba0 do_filp_open+0x15e/0x290 do_sys_openat2+0x122/0x160 __x64_sys_openat+0xef/0x170 do_syscall_64+0x60/0xd0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: dddfa46 ("[PATCH] eCryptfs: Public key; packet management") Signed-off-by: Brian Kubisiak <brian@kubisiak.com> Link: https://lore.kernel.org/r/5j2q56p6qkhezva6b2yuqfrsurmvrrqtxxzrnp3wqu7xrz22i7@hoecdztoplbl Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit c6854e5a267c28300ff045480b5a7ee7f6f1d913 ] Add a check to make sure that the requested xattr node size is no larger than the eraseblock minus the cleanmarker. Unlike the usual inode nodes, the xattr nodes aren't split into parts and spread across multiple eraseblocks, which means that a xattr node must not occupy more than one eraseblock. If the requested xattr value is too large, the xattr node can spill onto the next eraseblock, overwriting the nodes and causing errors such as: jffs2: argh. node added in wrong place at 0x0000b050(2) jffs2: nextblock 0x0000a000, expected at 0000b00c jffs2: error: (823) do_verify_xattr_datum: node CRC failed at 0x01e050, read=0xfc892c93, calc=0x000000 jffs2: notice: (823) jffs2_get_inode_nodes: Node header CRC failed at 0x01e00c. {848f,2fc4,0fef511f,59a3d171} jffs2: Node at 0x0000000c with length 0x00001044 would run over the end of the erase block jffs2: Perhaps the file system was created with the wrong erase size? jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000010: 0x1044 instead This breaks the filesystem and can lead to KASAN crashes such as: BUG: KASAN: slab-out-of-bounds in jffs2_sum_add_kvec+0x125e/0x15d0 Read of size 4 at addr ffff88802c31e914 by task repro/830 CPU: 0 PID: 830 Comm: repro Not tainted 6.9.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xc4/0x620 ? __virt_addr_valid+0x308/0x5b0 kasan_report+0xc1/0xf0 ? jffs2_sum_add_kvec+0x125e/0x15d0 ? jffs2_sum_add_kvec+0x125e/0x15d0 jffs2_sum_add_kvec+0x125e/0x15d0 jffs2_flash_direct_writev+0xa8/0xd0 jffs2_flash_writev+0x9c9/0xef0 ? __x64_sys_setxattr+0xc4/0x160 ? do_syscall_64+0x69/0x140 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: aa98d7c ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)") Signed-off-by: Ilya Denisyev <dev@elkcl.ru> Link: https://lore.kernel.org/r/20240412155357.237803-1-dev@elkcl.ru Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0e729af2eb6bee9eb58c4df1087f14ebaefe26b ] Is is reported that for dm-raid10, lvextend + lvchange --syncaction will trigger following softlockup: kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not tainted 6.9.0-rc4-next-20240419 #1 RIP: 0010:_raw_spin_unlock_irq+0x13/0x30 Call Trace: <TASK> md_bitmap_start_sync+0x6b/0xf0 raid10_sync_request+0x25c/0x1b40 [raid10] md_do_sync+0x64b/0x1020 md_thread+0xa7/0x170 kthread+0xcf/0x100 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1a/0x30 And the detailed process is as follows: md_do_sync j = mddev->resync_min while (j < max_sectors) sectors = raid10_sync_request(mddev, j, &skipped) if (!md_bitmap_start_sync(..., &sync_blocks)) // md_bitmap_start_sync set sync_blocks to 0 return sync_blocks + sectors_skippe; // sectors = 0; j += sectors; // j never change Root cause is that commit 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter") return early from md_bitmap_get_counter(), without setting returned blocks. Fix this problem by always set returned blocks from md_bitmap_get_counter"(), as it used to be. Noted that this patch just fix the softlockup problem in kernel, the case that bitmap size doesn't match array size still need to be fixed. Fixes: 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter") Reported-and-tested-by: Nigel Croxon <ncroxon@redhat.com> Closes: https://lore.kernel.org/all/71ba5272-ab07-43ba-8232-d2da642acb4e@redhat.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240422065824.2516-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit e03e7f20ebf7e1611d40d1fdc1bde900fd3335f6 ] syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1] Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node) [1] WARNING: possible circular locking dependency detected 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted ------------------------------------------------------ syz-executor350/5129 is trying to acquire lock: ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 but task is already holding lock: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (nr_node_list_lock){+...}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_remove_node net/netrom/nr_route.c:299 [inline] nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355 nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&nr_node->node_lock){+...}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nr_node_list_lock); lock(&nr_node->node_lock); lock(nr_node_list_lock); lock(&nr_node->node_lock); *** DEADLOCK *** 1 lock held by syz-executor350/5129: #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 stack backtrace: CPU: 0 PID: 5129 Comm: syz-executor350 Not tainted 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240515142934.3708038-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit ffbf4fb9b5c12ff878a10ea17997147ea4ebea6f ] When CONFIG_DEBUG_BUGVERBOSE=n, we fail to add necessary padding bytes to bug_table entries, and as a result the last entry in a bug table will be ignored, potentially leading to an unexpected panic(). All prior entries in the table will be handled correctly. The arm64 ABI requires that struct fields of up to 8 bytes are naturally-aligned, with padding added within a struct such that struct are suitably aligned within arrays. When CONFIG_DEBUG_BUGVERPOSE=y, the layout of a bug_entry is: struct bug_entry { signed int bug_addr_disp; // 4 bytes signed int file_disp; // 4 bytes unsigned short line; // 2 bytes unsigned short flags; // 2 bytes } ... with 12 bytes total, requiring 4-byte alignment. When CONFIG_DEBUG_BUGVERBOSE=n, the layout of a bug_entry is: struct bug_entry { signed int bug_addr_disp; // 4 bytes unsigned short flags; // 2 bytes < implicit padding > // 2 bytes } ... with 8 bytes total, with 6 bytes of data and 2 bytes of trailing padding, requiring 4-byte alginment. When we create a bug_entry in assembly, we align the start of the entry to 4 bytes, which implicitly handles padding for any prior entries. However, we do not align the end of the entry, and so when CONFIG_DEBUG_BUGVERBOSE=n, the final entry lacks the trailing padding bytes. For the main kernel image this is not a problem as find_bug() doesn't depend on the trailing padding bytes when searching for entries: for (bug = __start___bug_table; bug < __stop___bug_table; ++bug) if (bugaddr == bug_addr(bug)) return bug; However for modules, module_bug_finalize() depends on the trailing bytes when calculating the number of entries: mod->num_bugs = sechdrs[i].sh_size / sizeof(struct bug_entry); ... and as the last bug_entry lacks the necessary padding bytes, this entry will not be counted, e.g. in the case of a single entry: sechdrs[i].sh_size == 6 sizeof(struct bug_entry) == 8; sechdrs[i].sh_size / sizeof(struct bug_entry) == 0; Consequently module_find_bug() will miss the last bug_entry when it does: for (i = 0; i < mod->num_bugs; ++i, ++bug) if (bugaddr == bug_addr(bug)) goto out; ... which can lead to a kenrel panic due to an unhandled bug. This can be demonstrated with the following module: static int __init buginit(void) { WARN(1, "hello\n"); return 0; } static void __exit bugexit(void) { } module_init(buginit); module_exit(bugexit); MODULE_LICENSE("GPL"); ... which will trigger a kernel panic when loaded: ------------[ cut here ]------------ hello Unexpected kernel BRK exception at EL1 Internal error: BRK handler: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: hello(O+) CPU: 0 PID: 50 Comm: insmod Tainted: G O 6.9.1 #8 Hardware name: linux,dummy-virt (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : buginit+0x18/0x1000 [hello] lr : buginit+0x18/0x1000 [hello] sp : ffff800080533ae0 x29: ffff800080533ae0 x28: 0000000000000000 x27: 0000000000000000 x26: ffffaba8c4e70510 x25: ffff800080533c30 x24: ffffaba8c4a28a58 x23: 0000000000000000 x22: 0000000000000000 x21: ffff3947c0eab3c0 x20: ffffaba8c4e3f000 x19: ffffaba846464000 x18: 0000000000000006 x17: 0000000000000000 x16: ffffaba8c2492834 x15: 0720072007200720 x14: 0720072007200720 x13: ffffaba8c49b27c8 x12: 0000000000000312 x11: 0000000000000106 x10: ffffaba8c4a0a7c8 x9 : ffffaba8c49b27c8 x8 : 00000000ffffefff x7 : ffffaba8c4a0a7c8 x6 : 80000000fffff000 x5 : 0000000000000107 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff3947c0eab3c0 Call trace: buginit+0x18/0x1000 [hello] do_one_initcall+0x80/0x1c8 do_init_module+0x60/0x218 load_module+0x1ba4/0x1d70 __do_sys_init_module+0x198/0x1d0 __arm64_sys_init_module+0x1c/0x28 invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xd8 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: d0ffffe0 910003fd 91000000 9400000b (d4210000) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: BRK handler: Fatal exception Fix this by always aligning the end of a bug_entry to 4 bytes, which is correct regardless of CONFIG_DEBUG_BUGVERBOSE. Fixes: 9fb7410 ("arm64/BUG: Use BRK instruction for generic BUG traps") Signed-off-by: Yuanbin Xie <xieyuanbin1@huawei.com> Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/1716212077-43826-1-git-send-email-xiaojiangfeng@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

…oy_rcu() [ Upstream commit dc21c6cc3d6986d938efbf95de62473982c98dec ] syzbot reported that nf_reinject() could be called without rcu_read_lock() : WARNING: suspicious RCU usage 6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0 Not tainted net/netfilter/nfnetlink_queue.c:263 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz-executor.4/13427: #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2190 [inline] #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_core+0xa86/0x1830 kernel/rcu/tree.c:2471 #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: nfqnl_flush net/netfilter/nfnetlink_queue.c:405 [inline] #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: instance_destroy_rcu+0x30/0x220 net/netfilter/nfnetlink_queue.c:172 stack backtrace: CPU: 0 PID: 13427 Comm: syz-executor.4 Not tainted 6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 nf_reinject net/netfilter/nfnetlink_queue.c:323 [inline] nfqnl_reinject+0x6ec/0x1120 net/netfilter/nfnetlink_queue.c:397 nfqnl_flush net/netfilter/nfnetlink_queue.c:410 [inline] instance_destroy_rcu+0x1ae/0x220 net/netfilter/nfnetlink_queue.c:172 rcu_do_batch kernel/rcu/tree.c:2196 [inline] rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2471 handle_softirqs+0x2d6/0x990 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 </IRQ> <TASK> Fixes: 9872bec ("[NETFILTER]: nfnetlink: use RCU for queue instances hash") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 61ae320a29b0540c16931816299eb86bf2b66c08 upstream. There is no guarantee that rb_prev() will not return NULL in nft_rbtree_gc_elem(): general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] nft_add_set_elem+0x14b0/0x2990 nf_tables_newsetelem+0x528/0xb30 Furthermore, there is a possible use-after-free while iterating, 'node' can be free'd so we need to cache the next value to use. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d4f9d5a99a3fd1b1c691b7a1a6f8f3f25f4116c9 upstream. A system crash like this Failing address: 200000cb7df6f000 TEID: 200000cb7df6f403 Fault in home space mode while using kernel ASCE. AS:00000002d71bc007 R3:00000003fe5b8007 S:000000011a446000 P:000000015660c13d Oops: 0038 ilc:3 [#1] PREEMPT SMP Modules linked in: mlx5_ib ... CPU: 8 PID: 7556 Comm: bash Not tainted 6.9.0-rc7 #8 Hardware name: IBM 3931 A01 704 (LPAR) Krnl PSW : 0704e00180000000 0000014b75e7b606 (ap_parse_bitmap_str+0x10e/0x1f8) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 ffffffffffffffc0 0000000000000001 00000048f96b75d3 000000cb00000100 ffffffffffffffff ffffffffffffffff 000000cb7df6fce0 000000cb7df6fce0 00000000ffffffff 000000000000002b 00000048ffffffff 000003ff9b2dbc80 200000cb7df6fcd8 0000014bffffffc0 000000cb7df6fbc8 Krnl Code: 0000014b75e7b5fc: a7840047 brc 8,0000014b75e7b68a 0000014b75e7b600: 18b2 lr %r11,%r2 #0000014b75e7b602: a7f4000a brc 15,0000014b75e7b616 >0000014b75e7b606: eb22d00000e6 laog %r2,%r2,0(%r13) 0000014b75e7b60c: a7680001 lhi %r6,1 0000014b75e7b610: 187b lr %r7,%r11 0000014b75e7b612: 84960021 brxh %r9,%r6,0000014b75e7b654 0000014b75e7b616: 18e9 lr %r14,%r9 Call Trace: [<0000014b75e7b606>] ap_parse_bitmap_str+0x10e/0x1f8 ([<0000014b75e7b5dc>] ap_parse_bitmap_str+0xe4/0x1f8) [<0000014b75e7b758>] apmask_store+0x68/0x140 [<0000014b75679196>] kernfs_fop_write_iter+0x14e/0x1e8 [<0000014b75598524>] vfs_write+0x1b4/0x448 [<0000014b7559894c>] ksys_write+0x74/0x100 [<0000014b7618a440>] __do_syscall+0x268/0x328 [<0000014b761a3558>] system_call+0x70/0x98 INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000014b75e7b636>] ap_parse_bitmap_str+0x13e/0x1f8 Kernel panic - not syncing: Fatal exception: panic_on_oops occured when /sys/bus/ap/a[pq]mask was updated with a relative mask value (like +0x10-0x12,+60,-90) with one of the numeric values exceeding INT_MAX. The fix is simple: use unsigned long values for the internal variables. The correct checks are already in place in the function but a simple int for the internal variables was used with the possibility to overflow. Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 22f00812862564b314784167a89f27b444f82a46 upstream. The syzbot fuzzer found that the interrupt-URB completion callback in the cdc-wdm driver was taking too long, and the driver's immediate resubmission of interrupt URBs with -EPROTO status combined with the dummy-hcd emulation to cause a CPU lockup: cdc_wdm 1-1:1.0: nonzero urb status received: -71 cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625] CPU#0 Utilization every 4s during lockup: #1: 98% system, 0% softirq, 3% hardirq, 0% idle #2: 98% system, 0% softirq, 3% hardirq, 0% idle #3: 98% system, 0% softirq, 3% hardirq, 0% idle #4: 98% system, 0% softirq, 3% hardirq, 0% idle #5: 98% system, 1% softirq, 3% hardirq, 0% idle Modules linked in: irq event stamp: 73096 hardirqs last enabled at (73095): [<ffff80008037bc00>] console_emit_next_record kernel/printk/printk.c:2935 [inline] hardirqs last enabled at (73095): [<ffff80008037bc00>] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994 hardirqs last disabled at (73096): [<ffff80008af10b00>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (73096): [<ffff80008af10b00>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (73048): [<ffff8000801ea530>] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (73048): [<ffff8000801ea530>] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582 softirqs last disabled at (73043): [<ffff800080020de8>] __do_softirq+0x14/0x20 kernel/softirq.c:588 CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G W 6.10.0-rc2-syzkaller-g8867bbd4a056 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Testing showed that the problem did not occur if the two error messages -- the first two lines above -- were removed; apparently adding material to the kernel log takes a surprisingly large amount of time. In any case, the best approach for preventing these lockups and to avoid spamming the log with thousands of error messages per second is to ratelimit the two dev_err() calls. Therefore we replace them with dev_err_ratelimited(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Suggested-by: Greg KH <gregkh@linuxfoundation.org> Reported-and-tested-by: syzbot+5f996b83575ef4058638@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/00000000000073d54b061a6a1c65@google.com/ Reported-and-tested-by: syzbot+1b2abad17596ad03dcff@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000f45085061aa9b37e@google.com/ Fixes: 9908a32 ("USB: remove err() macro from usb class drivers") Link: https://lore.kernel.org/linux-usb/40dfa45b-5f21-4eef-a8c1-51a2f320e267@rowland.harvard.edu/ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/29855215-52f5-4385-b058-91f42c2bee18@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c0a40097f0bc81deafc15f9195d1fb54595cd6d0 upstream. Synchronize the dev->driver usage in really_probe() and dev_uevent(). These can run in different threads, what can result in the following race condition for dev->driver uninitialization: Thread #1: ========== really_probe() { ... probe_failed: ... device_unbind_cleanup(dev) { ... dev->driver = NULL; // <= Failed probe sets dev->driver to NULL ... } ... } Thread #2: ========== dev_uevent() { ... if (dev->driver) // If dev->driver is NULLed from really_probe() from here on, // after above check, the system crashes add_uevent_var(env, "DRIVER=%s", dev->driver->name); ... } really_probe() holds the lock, already. So nothing needs to be done there. dev_uevent() is called with lock held, often, too. But not always. What implies that we can't add any locking in dev_uevent() itself. So fix this race by adding the lock to the non-protected path. This is the path where above race is observed: dev_uevent+0x235/0x380 uevent_show+0x10c/0x1f0 <= Add lock here dev_attr_show+0x3a/0xa0 sysfs_kf_seq_show+0x17c/0x250 kernfs_seq_show+0x7c/0x90 seq_read_iter+0x2d7/0x940 kernfs_fop_read_iter+0xc6/0x310 vfs_read+0x5bc/0x6b0 ksys_read+0xeb/0x1b0 __x64_sys_read+0x42/0x50 x64_sys_call+0x27ad/0x2d30 do_syscall_64+0xcd/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Similar cases are reported by syzkaller in https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a But these are regarding the *initialization* of dev->driver dev->driver = drv; As this switches dev->driver to non-NULL these reports can be considered to be false-positives (which should be "fixed" by this commit, as well, though). The same issue was reported and tried to be fixed back in 2015 in https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@samsung.com/ already. Fixes: 239378f ("Driver core: add uevent vars for devices of a class") Cc: stable <stable@kernel.org> Cc: syzbot+ffa8143439596313a85a@syzkaller.appspotmail.com Cc: Ashish Sangwan <a.sangwan@samsung.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20240513050634.3964461-1-dirk.behme@de.bosch.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a4ca369ca221bb7e06c725792ac107f0e48e82e7 upstream. Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: <TASK> nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff0512 ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 28027ec8e32ecbadcd67623edb290dad61e735b5 ] The qedi_dbg_do_not_recover_cmd_read() function invokes sprintf() directly on a __user pointer, which results into the crash. To fix this issue, use a small local stack buffer for sprintf() and then call simple_read_from_buffer(), which in turns make the copy_to_user() call. BUG: unable to handle page fault for address: 00007f4801111000 PGD 8000000864df6067 P4D 8000000864df6067 PUD 864df7067 PMD 846028067 PTE 0 Oops: 0002 [#1] PREEMPT SMP PTI Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2023 RIP: 0010:memcpy_orig+0xcd/0x130 RSP: 0018:ffffb7a18c3ffc40 EFLAGS: 00010202 RAX: 00007f4801111000 RBX: 00007f4801111000 RCX: 000000000000000f RDX: 000000000000000f RSI: ffffffffc0bfd7a0 RDI: 00007f4801111000 RBP: ffffffffc0bfd7a0 R08: 725f746f6e5f6f64 R09: 3d7265766f636572 R10: ffffb7a18c3ffd08 R11: 0000000000000000 R12: 00007f4881110fff R13: 000000007fffffff R14: ffffb7a18c3ffca0 R15: ffffffffc0bfd7af FS: 00007f480118a740(0000) GS:ffff98e38af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4801111000 CR3: 0000000864b8e001 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? page_fault_oops+0x183/0x510 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x22/0x30 ? memcpy_orig+0xcd/0x130 vsnprintf+0x102/0x4c0 sprintf+0x51/0x80 qedi_dbg_do_not_recover_cmd_read+0x2f/0x50 [qedi 6bcfdeeecdea037da47069eca2ba717c84a77324] full_proxy_read+0x50/0x80 vfs_read+0xa5/0x2e0 ? folio_add_new_anon_rmap+0x44/0xa0 ? set_pte_at+0x15/0x30 ? do_pte_missing+0x426/0x7f0 ksys_read+0xa5/0xe0 do_syscall_64+0x58/0x80 ? __count_memcg_events+0x46/0x90 ? count_memcg_event_mm+0x3d/0x60 ? handle_mm_fault+0x196/0x2f0 ? do_user_addr_fault+0x267/0x890 ? exc_page_fault+0x69/0x150 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f4800f20b4d Tested-by: Martin Hoyer <mhoyer@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Manish Rangankar <mrangankar@marvell.com> Link: https://lore.kernel.org/r/20240415072155.30840-1-mrangankar@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit b86762dbe19a62e785c189f313cda5b989931f37 ] syzbot caught a NULL dereference in rt6_probe() [1] Bail out if __in6_dev_get() returns NULL. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f] CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline] RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758 Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19 RSP: 0018:ffffc900034af070 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000 RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000 FS: 00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784 nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496 __find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825 find_rr_leaf net/ipv6/route.c:853 [inline] rt6_select net/ipv6/route.c:897 [inline] fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195 ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231 pol_lookup_func include/net/ip6_fib.h:616 [inline] fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline] ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651 ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147 ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250 rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898 inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x4b8/0x5c0 net/socket.c:1160 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x6b6/0x1140 fs/read_write.c:590 ksys_write+0x1f8/0x260 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 52e1635 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240615151454.166404-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit d46401052c2d5614da8efea5788532f0401cb164 ] ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly. syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64 Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00 RSP: 0018:ffffc90000117378 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7 RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98 RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000 R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline] xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline] xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541 xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835 xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline] xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201 xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline] xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309 ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256 send6+0x611/0xd20 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40 wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240615154231.234442-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit ab9e0c529eb7cafebdd31fe1644524e80a48b05d upstream. If e.g. the ata_port_alloc() call in ata_host_alloc() fails, we will jump to the err_out label, which will call devres_release_group(). devres_release_group() will trigger a call to ata_host_release(). ata_host_release() calls kfree(host), so executing the kfree(host) in ata_host_alloc() will lead to a double free: kernel BUG at mm/slub.c:553! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 11 PID: 599 Comm: (udev-worker) Not tainted 6.10.0-rc5 #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:kfree+0x2cf/0x2f0 Code: 5d 41 5e 41 5f 5d e9 80 d6 ff ff 4d 89 f1 41 b8 01 00 00 00 48 89 d9 48 89 da RSP: 0018:ffffc90000f377f0 EFLAGS: 00010246 RAX: ffff888112b1f2c0 RBX: ffff888112b1f2c0 RCX: ffff888112b1f320 RDX: 000000000000400b RSI: ffffffffc02c9de5 RDI: ffff888112b1f2c0 RBP: ffffc90000f37830 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90000f37610 R11: 617461203a736b6e R12: ffffea00044ac780 R13: ffff888100046400 R14: ffffffffc02c9de5 R15: 0000000000000006 FS: 00007f2f1cabe980(0000) GS:ffff88813b380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2f1c3acf75 CR3: 0000000111724000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x6a/0x90 ? kfree+0x2cf/0x2f0 ? exc_invalid_op+0x50/0x70 ? kfree+0x2cf/0x2f0 ? asm_exc_invalid_op+0x1a/0x20 ? ata_host_alloc+0xf5/0x120 [libata] ? ata_host_alloc+0xf5/0x120 [libata] ? kfree+0x2cf/0x2f0 ata_host_alloc+0xf5/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Ensure that we will not call kfree(host) twice, by performing the kfree() only if the devres_open_group() call failed. Fixes: dafd6c4 ("libata: ensure host is free'd on error exit paths") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240629124210.181537-9-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit af9a8730ddb6a4b2edd779ccc0aceb994d616830 ] During the stress testing of the jffs2 file system,the following abnormal printouts were found: [ 2430.649000] Unable to handle kernel paging request at virtual address 0069696969696948 [ 2430.649622] Mem abort info: [ 2430.649829] ESR = 0x96000004 [ 2430.650115] EC = 0x25: DABT (current EL), IL = 32 bits [ 2430.650564] SET = 0, FnV = 0 [ 2430.650795] EA = 0, S1PTW = 0 [ 2430.651032] FSC = 0x04: level 0 translation fault [ 2430.651446] Data abort info: [ 2430.651683] ISV = 0, ISS = 0x00000004 [ 2430.652001] CM = 0, WnR = 0 [ 2430.652558] [0069696969696948] address between user and kernel address ranges [ 2430.653265] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2430.654512] CPU: 2 PID: 20919 Comm: cat Not tainted 5.15.25-g512f31242bf6 #33 [ 2430.655008] Hardware name: linux,dummy-virt (DT) [ 2430.655517] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2430.656142] pc : kfree+0x78/0x348 [ 2430.656630] lr : jffs2_free_inode+0x24/0x48 [ 2430.657051] sp : ffff800009eebd10 [ 2430.657355] x29: ffff800009eebd10 x28: 0000000000000001 x27: 0000000000000000 [ 2430.658327] x26: ffff000038f09d80 x25: 0080000000000000 x24: ffff800009d38000 [ 2430.658919] x23: 5a5a5a5a5a5a5a5a x22: ffff000038f09d80 x21: ffff8000084f0d14 [ 2430.659434] x20: ffff0000bf9a6ac0 x19: 0169696969696940 x18: 0000000000000000 [ 2430.659969] x17: ffff8000b6506000 x16: ffff800009eec000 x15: 0000000000004000 [ 2430.660637] x14: 0000000000000000 x13: 00000001000820a1 x12: 00000000000d1b19 [ 2430.661345] x11: 0004000800000000 x10: 0000000000000001 x9 : ffff8000084f0d14 [ 2430.662025] x8 : ffff0000bf9a6b40 x7 : ffff0000bf9a6b48 x6 : 0000000003470302 [ 2430.662695] x5 : ffff00002e41dcc0 x4 : ffff0000bf9aa3b0 x3 : 0000000003470342 [ 2430.663486] x2 : 0000000000000000 x1 : ffff8000084f0d14 x0 : fffffc0000000000 [ 2430.664217] Call trace: [ 2430.664528] kfree+0x78/0x348 [ 2430.664855] jffs2_free_inode+0x24/0x48 [ 2430.665233] i_callback+0x24/0x50 [ 2430.665528] rcu_do_batch+0x1ac/0x448 [ 2430.665892] rcu_core+0x28c/0x3c8 [ 2430.666151] rcu_core_si+0x18/0x28 [ 2430.666473] __do_softirq+0x138/0x3cc [ 2430.666781] irq_exit+0xf0/0x110 [ 2430.667065] handle_domain_irq+0x6c/0x98 [ 2430.667447] gic_handle_irq+0xac/0xe8 [ 2430.667739] call_on_irq_stack+0x28/0x54 The parameter passed to kfree was 5a5a5a5a, which corresponds to the target field of the jffs_inode_info structure. It was found that all variables in the jffs_inode_info structure were 5a5a5a5a, except for the first member sem. It is suspected that these variables are not initialized because they were set to 5a5a5a5a during memory testing, which is meant to detect uninitialized memory.The sem variable is initialized in the function jffs2_i_init_once, while other members are initialized in the function jffs2_init_inode_info. The function jffs2_init_inode_info is called after iget_locked, but in the iget_locked function, the destroy_inode process is triggered, which releases the inode and consequently, the target member of the inode is not initialized.In concurrent high pressure scenarios, iget_locked may enter the destroy_inode branch as described in the code. Since the destroy_inode functionality of jffs2 only releases the target, the fix method is to set target to NULL in jffs2_i_init_once. Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn> Reviewed-by: Yang Tao <yang.tao172@zte.com.cn> Cc: Xu Xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit e271ff53807e8f2c628758290f0e499dbe51cb3d ] In function bond_option_arp_ip_targets_set(), if newval->string is an empty string, newval->string+1 will point to the byte after the string, causing an out-of-bound read. BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418 Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107 CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc1/0x5e0 mm/kasan/report.c:475 kasan_report+0xbe/0xf0 mm/kasan/report.c:588 strlen+0x7d/0xa0 lib/string.c:418 __fortify_strlen include/linux/fortify-string.h:210 [inline] in4_pton+0xa3/0x3f0 net/core/utils.c:130 bond_option_arp_ip_targets_set+0xc2/0x910 drivers/net/bonding/bond_options.c:1201 __bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767 __bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792 bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817 bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156 dev_attr_store+0x54/0x80 drivers/base/core.c:2366 sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136 kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x96a/0xd80 fs/read_write.c:584 ksys_write+0x122/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace ]--- Fix it by adding a check of string length before using it. Fixes: f9de11a ("bonding: add ip checks when store ip target") Signed-off-by: Yue Sun <samsun1006219@gmail.com> Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

libxzr added 2 commits November 1, 2021 08:33

qmi_rmnet: Make powersave workqueue unbound and freezable

4f5d8f3

* After commit "Revert "dfc: Use alarm timer to trigger powersave work", this workqueue is now is use for qmi_rmnet_check_stats. Make it unbound and freezable to save power. Signed-off-by: LibXZR <xzr467706992@163.com>

ClassfulNick closed this Nov 2, 2021

ClassfulNick mentioned this pull request Nov 4, 2021

Prevent hard reset on modem crash #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to avoid random kernel panic reboots #1

Attempt to avoid random kernel panic reboots #1

ClassfulNick commented Nov 1, 2021 •

edited

Loading

ClassfulNick commented Nov 2, 2021

Attempt to avoid random kernel panic reboots #1

Attempt to avoid random kernel panic reboots #1

Conversation

ClassfulNick commented Nov 1, 2021 • edited Loading

ClassfulNick commented Nov 2, 2021

ClassfulNick commented Nov 1, 2021 •

edited

Loading