forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 5
mm patches #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
mm patches #1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What watermark boosting does is preemptively fire up kswapd to free memory when there hasn't been an allocation failure. It does this by increasing kswapd's high watermark goal and then firing up kswapd. The reason why this causes freezes is because, with the increased high watermark goal, kswapd will steal memory from processes that need it in order to make forward progress. These processes will, in turn, try to allocate memory again, which will cause kswapd to steal necessary pages from those processes again, in a positive feedback loop known as page thrashing. When page thrashing occurs, your system is essentially livelocked until the necessary forward progress can be made to stop processes from trying to continuously allocate memory and trigger kswapd to steal it back. This problem already occurs with kswapd *without* watermark boosting, but it's usually only encountered on machines with a small amount of memory and/or a slow CPU. Watermark boosting just makes the existing problem worse enough to notice on higher spec'd machines. Disable watermark boosting by default since it's a total dumpster fire. I can't imagine why anyone would want to explicitly enable it, but the option is there in case someone does. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Keeping kswapd running when all the failed allocations that invoked it are satisfied incurs a high overhead due to unnecessary page eviction and writeback, as well as spurious VM pressure events to various registered shrinkers. When kswapd doesn't need to work to make an allocation succeed anymore, stop it prematurely to save resources. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
The page allocator wakes all kswapds in an allocation context's allowed nodemask in the slow path, so it doesn't make sense to have the kswapd- waiter count per each NUMA node. Instead, it should be a global counter to stop all kswapds when there are no failed allocation requests. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Throttled direct reclaimers will wake up kswapd and wait for kswapd to satisfy their page allocation request, even when the failed allocation lacks the __GFP_KSWAPD_RECLAIM flag in its gfp mask. As a result, kswapd may think that there are no waiters and thus exit prematurely, causing throttled direct reclaimers lacking __GFP_KSWAPD_RECLAIM to stall on waiting for kswapd to wake them up. Incrementing the kswapd_waiters counter when such direct reclaimers become throttled fixes the problem. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
On-demand compaction works fine assuming that you don't have a need to spam the page allocator nonstop for large order page allocations. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
There is noticeable scheduling latency and heavy zone lock contention stemming from rmqueue_bulk's single hold of the zone lock while doing its work, as seen with the preemptoff tracer. There's no actual need for rmqueue_bulk() to hold the zone lock the entire time; it only does so for supposed efficiency. As such, we can relax the zone lock and even reschedule when IRQs are enabled in order to keep the scheduling delays and zone lock contention at bay. Forward progress is still guaranteed, as the zone lock can only be relaxed after page removal. With this change, rmqueue_bulk() no longer appears as a serious offender in the preemptoff tracer, and system latency is noticeably improved. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Allocating pages with __get_free_page is slower than going through the
slab allocator to grab free pages out from a pool.
These are the results from running the code at the bottom of this
message:
[ 1.278602] speedtest: __get_free_page: 9 us
[ 1.278606] speedtest: kmalloc: 4 us
[ 1.278609] speedtest: kmem_cache_alloc: 4 us
[ 1.278611] speedtest: vmalloc: 13 us
kmalloc and kmem_cache_alloc (which is what kmalloc uses for common
sizes behind the scenes) are the fastest choices. Use kmalloc to speed
up sg list allocation.
This is the code used to produce the above measurements:
static int speedtest(void *data)
{
static const struct sched_param sched_max_rt_prio = {
.sched_priority = MAX_RT_PRIO - 1
};
volatile s64 ctotal = 0, gtotal = 0, ktotal = 0, vtotal = 0;
struct kmem_cache *page_pool;
int i, j, trials = 1000;
volatile ktime_t start;
void *ptr[100];
sched_setscheduler_nocheck(current, SCHED_FIFO, &sched_max_rt_prio);
page_pool = kmem_cache_create("pages", PAGE_SIZE, PAGE_SIZE, SLAB_PANIC,
NULL);
for (i = 0; i < trials; i++) {
start = ktime_get();
for (j = 0; j < ARRAY_SIZE(ptr); j++)
while (!(ptr[j] = kmem_cache_alloc(page_pool, GFP_KERNEL)));
ctotal += ktime_us_delta(ktime_get(), start);
for (j = 0; j < ARRAY_SIZE(ptr); j++)
kmem_cache_free(page_pool, ptr[j]);
start = ktime_get();
for (j = 0; j < ARRAY_SIZE(ptr); j++)
while (!(ptr[j] = (void *)__get_free_page(GFP_KERNEL)));
gtotal += ktime_us_delta(ktime_get(), start);
for (j = 0; j < ARRAY_SIZE(ptr); j++)
free_page((unsigned long)ptr[j]);
start = ktime_get();
for (j = 0; j < ARRAY_SIZE(ptr); j++)
while (!(ptr[j] = __kmalloc(PAGE_SIZE, GFP_KERNEL)));
ktotal += ktime_us_delta(ktime_get(), start);
for (j = 0; j < ARRAY_SIZE(ptr); j++)
kfree(ptr[j]);
start = ktime_get();
*ptr = vmalloc(ARRAY_SIZE(ptr) * PAGE_SIZE);
vtotal += ktime_us_delta(ktime_get(), start);
vfree(*ptr);
}
kmem_cache_destroy(page_pool);
printk("%s: __get_free_page: %lld us\n", __func__, gtotal / trials);
printk("%s: __kmalloc: %lld us\n", __func__, ktotal / trials);
printk("%s: kmem_cache_alloc: %lld us\n", __func__, ctotal / trials);
printk("%s: vmalloc: %lld us\n", __func__, vtotal / trials);
complete(data);
return 0;
}
static int __init start_test(void)
{
DECLARE_COMPLETION_ONSTACK(done);
BUG_ON(IS_ERR(kthread_run(speedtest, &done, "malloc_test")));
wait_for_completion(&done);
return 0;
}
late_initcall(start_test);
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
The RCU read lock isn't necessary in list_lru_count_one() when the condition that requires RCU (CONFIG_MEMCG && !CONFIG_SLOB) isn't met. The highly-frequent RCU lock and unlock adds measurable overhead to the shrink_slab() path when it isn't needed. As such, we can simply omit the RCU read lock in this case to improve performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Kazuki Hashimoto <kazukih@tuta.io>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 7962ef1 ] In 3cb4d5e ("perf trace: Free syscall tp fields in evsel->priv") it only was freeing if strcmp(evsel->tp_format->system, "syscalls") returned zero, while the corresponding initialization of evsel->priv was being performed if it was _not_ zero, i.e. if the tp system wasn't 'syscalls'. Just stop looking for that and free it if evsel->priv was set, which should be equivalent. Also use the pre-existing evsel_trace__delete() function. This resolves these leaks, detected with: $ make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin ================================================================= ==481565==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f7343cba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x987966 in zalloc (/home/acme/bin/perf+0x987966) #2 0x52f9b9 in evsel_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:307 #3 0x52f9b9 in evsel__syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:333 #4 0x52f9b9 in evsel__init_raw_syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:458 #5 0x52f9b9 in perf_evsel__raw_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:480 #6 0x540e8b in trace__add_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3212 torvalds#7 0x540e8b in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3891 torvalds#8 0x540e8b in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5156 torvalds#9 0x5ef262 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 torvalds#10 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 torvalds#11 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 torvalds#12 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 torvalds#13 0x7f7342c4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f7343cba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x987966 in zalloc (/home/acme/bin/perf+0x987966) #2 0x52f9b9 in evsel_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:307 #3 0x52f9b9 in evsel__syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:333 #4 0x52f9b9 in evsel__init_raw_syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:458 #5 0x52f9b9 in perf_evsel__raw_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:480 #6 0x540dd1 in trace__add_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3205 torvalds#7 0x540dd1 in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3891 torvalds#8 0x540dd1 in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5156 torvalds#9 0x5ef262 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 torvalds#10 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 torvalds#11 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 torvalds#12 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 torvalds#13 0x7f7342c4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) SUMMARY: AddressSanitizer: 80 byte(s) leaked in 2 allocation(s). [root@quaco ~]# With this we plug all leaks with "perf trace sleep 1". Fixes: 3cb4d5e ("perf trace: Free syscall tp fields in evsel->priv") Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Link: https://lore.kernel.org/lkml/20230719202951.534582-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit ef23cb5 ] While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling: perf_session__delete(ERR_PTR(-1)) Resulting in: (gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 #1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 #2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 #3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 #4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 torvalds#7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb) So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported. The same problem was found in 'perf top' after an audit of all perf_session__new() failure handling. Fixes: 6ef81c5 ("perf session: Return error code for perf_session__new() function on failure") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Shawn Landden <shawn@git.icu> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: https://lore.kernel.org/lkml/ZN4Q2rxxsL08A8rd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit abaf1e0 ] While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling: perf_session__delete(ERR_PTR(-1)) Resulting in: (gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 #1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 #2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 #3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 #4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 torvalds#7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb) So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported. Fixes: eef4fee ("perf lock: Dynamically allocate lockhash_table") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/lkml/ZN4R1AYfsD2J8lRs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 82ba0ff ] We should not call trace_handshake_cmd_done_err() if socket lookup has failed. Also we should call trace_handshake_cmd_done_err() before releasing the file, otherwise dereferencing sock->sk can return garbage. This also reverts 7afc6d0 ("net/handshake: Fix uninitialized local variable") Unable to handle kernel paging request at virtual address dfff800000000003 KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [dfff800000000003] address between user and kernel address ranges Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 5986 Comm: syz-executor292 Not tainted 6.5.0-rc7-syzkaller-gfe4469582053 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193 lr : handshake_nl_done_doit+0x180/0x9c8 sp : ffff800096e37180 x29: ffff800096e37200 x28: 1ffff00012dc6e34 x27: dfff800000000000 x26: ffff800096e373d0 x25: 0000000000000000 x24: 00000000ffffffa8 x23: ffff800096e373f0 x22: 1ffff00012dc6e38 x21: 0000000000000000 x20: ffff800096e371c0 x19: 0000000000000018 x18: 0000000000000000 x17: 0000000000000000 x16: ffff800080516cc4 x15: 0000000000000001 x14: 1fffe0001b14aa3b x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000003 x8 : 0000000000000003 x7 : ffff800080afe47c x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800080a88078 x2 : 0000000000000001 x1 : 00000000ffffffa8 x0 : 0000000000000000 Call trace: handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193 genl_family_rcv_msg_doit net/netlink/genetlink.c:970 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1050 [inline] genl_rcv_msg+0x96c/0xc50 net/netlink/genetlink.c:1067 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2549 genl_rcv+0x38/0x50 net/netlink/genetlink.c:1078 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x56c/0x840 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2584 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Code: 12800108 b90043e8 910062b3 d343fe68 (387b6908) Fixes: 3b3009e ("net/handshake: Create a NETLINK service for handling handshake requests") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit a96a44a ] './test_progs -t test_local_storage' reported a splat: [ 27.137569] ============================= [ 27.138122] [ BUG: Invalid wait context ] [ 27.138650] 6.5.0-03980-gd11ae1b16b0a torvalds#247 Tainted: G O [ 27.139542] ----------------------------- [ 27.140106] test_progs/1729 is trying to lock: [ 27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130 [ 27.141834] other info that might help us debug this: [ 27.142437] context-{5:5} [ 27.142856] 2 locks held by test_progs/1729: [ 27.143352] #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40 [ 27.144492] #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0 [ 27.145855] stack backtrace: [ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O 6.5.0-03980-gd11ae1b16b0a torvalds#247 [ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 27.149127] Call Trace: [ 27.149490] <TASK> [ 27.149867] dump_stack_lvl+0x130/0x1d0 [ 27.152609] dump_stack+0x14/0x20 [ 27.153131] __lock_acquire+0x1657/0x2220 [ 27.153677] lock_acquire+0x1b8/0x510 [ 27.157908] local_lock_acquire+0x29/0x130 [ 27.159048] obj_cgroup_charge+0xf4/0x3c0 [ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0 [ 27.161931] __kmem_cache_alloc_node+0x51/0x210 [ 27.163557] __kmalloc+0xaa/0x210 [ 27.164593] bpf_map_kzalloc+0xbc/0x170 [ 27.165147] bpf_selem_alloc+0x130/0x510 [ 27.166295] bpf_local_storage_update+0x5aa/0x8e0 [ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0 [ 27.169199] bpf_map_update_value+0x415/0x4f0 [ 27.169871] map_update_elem+0x413/0x550 [ 27.170330] __sys_bpf+0x5e9/0x640 [ 27.174065] __x64_sys_bpf+0x80/0x90 [ 27.174568] do_syscall_64+0x48/0xa0 [ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 27.175932] RIP: 0033:0x7effb40e41ad [ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8 [ 27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141 [ 27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad [ 27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002 [ 27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788 [ 27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000 [ 27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000 [ 27.184958] </TASK> It complains about acquiring a local_lock while holding a raw_spin_lock. It means it should not allocate memory while holding a raw_spin_lock since it is not safe for RT. raw_spin_lock is needed because bpf_local_storage supports tracing context. In particular for task local storage, it is easy to get a "current" task PTR_TO_BTF_ID in tracing bpf prog. However, task (and cgroup) local storage has already been moved to bpf mem allocator which can be used after raw_spin_lock. The splat is for the sk storage. For sk (and inode) storage, it has not been moved to bpf mem allocator. Using raw_spin_lock or not, kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context. However, the local storage helper requires a verifier accepted sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running a bpf prog in a kzalloc unsafe context and also able to hold a verifier accepted sk pointer) could happen. This patch avoids kzalloc after raw_spin_lock to silent the splat. There is an existing kzalloc before the raw_spin_lock. At that point, a kzalloc is very likely required because a lookup has just been done before. Thus, this patch always does the kzalloc before acquiring the raw_spin_lock and remove the later kzalloc usage after the raw_spin_lock. After this change, it will have a charge and then uncharge during the syscall bpf_map_update_elem() code path. This patch opts for simplicity and not continue the old optimization to save one charge and uncharge. This issue is dated back to the very first commit of bpf_sk_storage which had been refactored multiple times to create task, inode, and cgroup storage. This patch uses a Fixes tag with a more recent commit that should be easier to do backport. Fixes: b00fa38 ("bpf: Enable non-atomic allocations in local storage") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit f4f8a78 ] The opt_num field is controlled by user mode and is not currently validated inside the kernel. An attacker can take advantage of this to trigger an OOB read and potentially leak information. BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88 Read of size 2 at addr ffff88804bc64272 by task poc/6431 CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1 Call Trace: nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88 nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281 nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47 expr_call_ops_eval net/netfilter/nf_tables_core.c:214 nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264 nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23 [..] Also add validation to genre, subtype and version fields. Fixes: 11eeef4 ("netfilter: passive OS fingerprint xtables match") Reported-by: Lucas Leong <wmliang@infosec.exchange> Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 9b5ba5c ] Deliver audit log from __nf_tables_dump_rules(), table dereference at the end of the table list loop might point to the list head, leading to this crash. [ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50 [ 4137.407357] #PF: supervisor read access in kernel mode [ 4137.407359] #PF: error_code(0x0000) - not-present page [ 4137.407360] PGD 0 P4D 0 [ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI [ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ torvalds#277 [ 4137.407369] RIP: 0010:string+0x49/0xd0 [ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81 [ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286 [ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04 [ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000 [ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff [ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000 [ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709 [ 4137.407385] FS: 00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000 [ 4137.407387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0 [ 4137.407390] Call Trace: [ 4137.407392] <TASK> [ 4137.407393] ? __die+0x1b/0x60 [ 4137.407397] ? page_fault_oops+0x6b/0xa0 [ 4137.407399] ? exc_page_fault+0x60/0x120 [ 4137.407403] ? asm_exc_page_fault+0x22/0x30 [ 4137.407408] ? string+0x49/0xd0 [ 4137.407410] vsnprintf+0x257/0x4f0 [ 4137.407414] kvasprintf+0x3e/0xb0 [ 4137.407417] kasprintf+0x3e/0x50 [ 4137.407419] nf_tables_dump_rules+0x1c0/0x360 [nf_tables] [ 4137.407439] ? __alloc_skb+0xc3/0x170 [ 4137.407442] netlink_dump+0x170/0x330 [ 4137.407447] __netlink_dump_start+0x227/0x300 [ 4137.407449] nf_tables_getrule+0x205/0x390 [nf_tables] Deliver audit log only once at the end of the rule dump+reset for consistency with the set dump+reset. Ensure audit reset access to table under rcu read side lock. The table list iteration holds rcu read lock side, but recent audit code dereferences table object out of the rcu read lock side. Fixes: ea078ae ("netfilter: nf_tables: Audit log rule reset") Fixes: 7e9be11 ("netfilter: nf_tables: Audit log setelem reset") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
commit 768d612 upstream. Yikebaer reported an issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 Read of size 4 at addr ffff888112ecc1a4 by task syz-executor/8438 CPU: 1 PID: 8438 Comm: syz-executor Not tainted 6.5.0-rc5 #1 Call Trace: [...] kasan_report+0xba/0xf0 mm/kasan/report.c:588 ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Allocated by task 8438: [...] kmem_cache_zalloc include/linux/slab.h:693 [inline] __es_alloc_extent fs/ext4/extents_status.c:469 [inline] ext4_es_insert_extent+0x672/0xcb0 fs/ext4/extents_status.c:873 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Freed by task 8438: [...] kmem_cache_free+0xec/0x490 mm/slub.c:3823 ext4_es_try_to_merge_right fs/ext4/extents_status.c:593 [inline] __es_insert_extent+0x9f4/0x1440 fs/ext4/extents_status.c:802 ext4_es_insert_extent+0x2ca/0xcb0 fs/ext4/extents_status.c:882 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] ================================================================== The flow of issue triggering is as follows: 1. remove es raw es es removed es1 |-------------------| -> |----|.......|------| 2. insert es es insert es1 merge with es es1 merge with es and free es1 |----|.......|------| -> |------------|------| -> |-------------------| es merges with newes, then merges with es1, frees es1, then determines if es1->es_len is 0 and triggers a UAF. The code flow is as follows: ext4_es_insert_extent es1 = __es_alloc_extent(true); es2 = __es_alloc_extent(true); __es_remove_extent(inode, lblk, end, NULL, es1) __es_insert_extent(inode, &newes, es1) ---> insert es1 to es tree __es_insert_extent(inode, &newes, es2) ext4_es_try_to_merge_right ext4_es_free_extent(inode, es1) ---> es1 is freed if (es1 && !es1->es_len) // Trigger UAF by determining if es1 is used. We determine whether es1 or es2 is used immediately after calling __es_remove_extent() or __es_insert_extent() to avoid triggering a UAF if es1 or es2 is freed. Reported-by: Yikebaer Aizezi <yikebaer61@gmail.com> Closes: https://lore.kernel.org/lkml/CALcu4raD4h9coiyEBL4Bm0zjDwxC2CyPiTwsP3zFuhot6y9Beg@mail.gmail.com Fixes: 2a69c45 ("ext4: using nofail preallocation in ext4_es_insert_extent()") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230815070808.3377171-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
commit a3ab557 upstream. Let's flush the inode being aborted atomic operation to avoid stale dirty inode during eviction in this call stack: f2fs_mark_inode_dirty_sync+0x22/0x40 [f2fs] f2fs_abort_atomic_write+0xc4/0xf0 [f2fs] f2fs_evict_inode+0x3f/0x690 [f2fs] ? sugov_start+0x140/0x140 evict+0xc3/0x1c0 evict_inodes+0x17b/0x210 generic_shutdown_super+0x32/0x120 kill_block_super+0x21/0x50 deactivate_locked_super+0x31/0x90 cleanup_mnt+0x100/0x160 task_work_run+0x59/0x90 do_exit+0x33b/0xa50 do_group_exit+0x2d/0x80 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd This triggers f2fs_bug_on() in f2fs_evict_inode: f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE)); This fixes the syzbot report: loop0: detected capacity change from 0 to 131072 F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): Found nat_bits in checkpoint F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:869! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 5014 Comm: syz-executor220 Not tainted 6.4.0-syzkaller-11479-g6cd06ab12d1a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869 Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007 RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000 R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0 Call Trace: <TASK> evict+0x2ed/0x6b0 fs/inode.c:665 dispose_list+0x117/0x1e0 fs/inode.c:698 evict_inodes+0x345/0x440 fs/inode.c:748 generic_shutdown_super+0xaf/0x480 fs/super.c:478 kill_block_super+0x64/0xb0 fs/super.c:1417 kill_f2fs_super+0x2af/0x3c0 fs/f2fs/super.c:4704 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1254 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa9a/0x29a0 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f309be71a09 Code: Unable to access opcode bytes at 0x7f309be719df. RSP: 002b:00007fff171df518 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f309bef7330 RCX: 00007f309be71a09 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 00007f309bef1e40 R10: 0000000000010600 R11: 0000000000000246 R12: 00007f309bef7330 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869 Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007 RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000 R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0 Cc: <stable@vger.kernel.org> Reported-and-tested-by: syzbot+e1246909d526a9d470fa@syzkaller.appspotmail.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
commit 5c13e23 upstream. ====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted ------------------------------------------------------ syz-executor273/5027 is trying to acquire lock: ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline] ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644 but task is already holding lock: ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline] ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fi->i_xattr_sem){.+.+}-{3:3}: down_read+0x9c/0x470 kernel/locking/rwsem.c:1520 f2fs_down_read fs/f2fs/f2fs.h:2108 [inline] f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532 __f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179 f2fs_acl_create fs/f2fs/acl.c:377 [inline] f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420 f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558 f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740 f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788 f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827 f2fs_add_link fs/f2fs/f2fs.h:3554 [inline] f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781 vfs_mkdir+0x532/0x7e0 fs/namei.c:4117 do_mkdirat+0x2a9/0x330 fs/namei.c:4140 __do_sys_mkdir fs/namei.c:4160 [inline] __se_sys_mkdir fs/namei.c:4158 [inline] __x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (&fi->i_sem){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 down_write+0x93/0x200 kernel/locking/rwsem.c:1573 f2fs_down_write fs/f2fs/f2fs.h:2133 [inline] f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644 f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784 f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827 f2fs_add_link fs/f2fs/f2fs.h:3554 [inline] f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781 vfs_mkdir+0x532/0x7e0 fs/namei.c:4117 ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline] ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146 ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309 ovl_make_workdir fs/overlayfs/super.c:711 [inline] ovl_get_workdir fs/overlayfs/super.c:864 [inline] ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400 vfs_get_super+0xf9/0x290 fs/super.c:1152 vfs_get_tree+0x88/0x350 fs/super.c:1519 do_new_mount fs/namespace.c:3335 [inline] path_mount+0x1492/0x1ed0 fs/namespace.c:3662 do_mount fs/namespace.c:3675 [inline] __do_sys_mount fs/namespace.c:3884 [inline] __se_sys_mount fs/namespace.c:3861 [inline] __x64_sys_mount+0x293/0x310 fs/namespace.c:3861 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&fi->i_xattr_sem); lock(&fi->i_sem); lock(&fi->i_xattr_sem); lock(&fi->i_sem); Cc: <stable@vger.kernel.org> Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com Fixes: 5eda1ad "f2fs: fix deadlock in i_xattr_sem and inode page lock" Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
commit 6f0df8e upstream. In the eviction recency check, we attempt to retrieve the memcg to which the folio belonged when it was evicted, by the memcg id stored in the shadow entry. However, there is a chance that the retrieved memcg is not the original memcg that has been killed, but a new one which happens to have the same id. This is a somewhat unfortunate, but acceptable and rare inaccuracy in the heuristics. However, if we retrieve this new memcg between its allocation and when it is properly attached to the memcg hierarchy, we could run into the following NULL pointer exception during the memcg hierarchy traversal done in mem_cgroup_get_nr_swap_pages(): [ 155757.793456] BUG: kernel NULL pointer dereference, address: 00000000000000c0 [ 155757.807568] #PF: supervisor read access in kernel mode [ 155757.818024] #PF: error_code(0x0000) - not-present page [ 155757.828482] PGD 401f77067 P4D 401f77067 PUD 401f76067 PMD 0 [ 155757.839985] Oops: 0000 [#1] SMP [ 155757.887870] RIP: 0010:mem_cgroup_get_nr_swap_pages+0x3d/0xb0 [ 155757.899377] Code: 29 19 4a 02 48 39 f9 74 63 48 8b 97 c0 00 00 00 48 8b b7 58 02 00 00 48 2b b7 c0 01 00 00 48 39 f0 48 0f 4d c6 48 39 d1 74 42 <48> 8b b2 c0 00 00 00 48 8b ba 58 02 00 00 48 2b ba c0 01 00 00 48 [ 155757.937125] RSP: 0018:ffffc9002ecdfbc8 EFLAGS: 00010286 [ 155757.947755] RAX: 00000000003a3b1c RBX: 000007ffffffffff RCX: ffff888280183000 [ 155757.962202] RDX: 0000000000000000 RSI: 0007ffffffffffff RDI: ffff888bbc2d1000 [ 155757.976648] RBP: 0000000000000001 R08: 000000000000000b R09: ffff888ad9cedba0 [ 155757.991094] R10: ffffea0039c07900 R11: 0000000000000010 R12: ffff888b23a7b000 [ 155758.005540] R13: 0000000000000000 R14: ffff888bbc2d1000 R15: 000007ffffc71354 [ 155758.019991] FS: 00007f6234c68640(0000) GS:ffff88903f9c0000(0000) knlGS:0000000000000000 [ 155758.036356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155758.048023] CR2: 00000000000000c0 CR3: 0000000a83eb8004 CR4: 00000000007706e0 [ 155758.062473] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 155758.076924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 155758.091376] PKRU: 55555554 [ 155758.096957] Call Trace: [ 155758.102016] <TASK> [ 155758.106502] ? __die+0x78/0xc0 [ 155758.112793] ? page_fault_oops+0x286/0x380 [ 155758.121175] ? exc_page_fault+0x5d/0x110 [ 155758.129209] ? asm_exc_page_fault+0x22/0x30 [ 155758.137763] ? mem_cgroup_get_nr_swap_pages+0x3d/0xb0 [ 155758.148060] workingset_test_recent+0xda/0x1b0 [ 155758.157133] workingset_refault+0xca/0x1e0 [ 155758.165508] filemap_add_folio+0x4d/0x70 [ 155758.173538] page_cache_ra_unbounded+0xed/0x190 [ 155758.182919] page_cache_sync_ra+0xd6/0x1e0 [ 155758.191738] filemap_read+0x68d/0xdf0 [ 155758.199495] ? mlx5e_napi_poll+0x123/0x940 [ 155758.207981] ? __napi_schedule+0x55/0x90 [ 155758.216095] __x64_sys_pread64+0x1d6/0x2c0 [ 155758.224601] do_syscall_64+0x3d/0x80 [ 155758.232058] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 155758.242473] RIP: 0033:0x7f62c29153b5 [ 155758.249938] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 b4 e6 f7 ff 41 89 c0 4c 8b 55 e0 48 8b 55 e8 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 e7 e6 f7 ff 48 8b [ 155758.288005] RSP: 002b:00007f6234c5ffd0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 155758.303474] RAX: ffffffffffffffda RBX: 00007f628c4e70c0 RCX: 00007f62c29153b5 [ 155758.318075] RDX: 000000000003c041 RSI: 00007f61d2986000 RDI: 0000000000000076 [ 155758.332678] RBP: 00007f6234c5fff0 R08: 0000000000000000 R09: 0000000064d5230c [ 155758.347452] R10: 000000000027d450 R11: 0000000000000293 R12: 000000000003c041 [ 155758.362044] R13: 00007f61d2986000 R14: 00007f629e11b060 R15: 000000000027d450 [ 155758.376661] </TASK> This patch fixes the issue by moving the memcg's id publication from the alloc stage to online stage, ensuring that any memcg acquired via id must be connected to the memcg tree. Link: https://lkml.kernel.org/r/20230823225430.166925-1-nphamcs@gmail.com Fixes: f78dfc7 ("workingset: fix confusion around eviction vs refault container") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Co-developed-by: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
commit e7f1326 upstream. One of the CI runs triggered the following panic assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229 ------------[ cut here ]------------ kernel BUG at fs/btrfs/subpage.c:229! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 PID: 923660 Comm: btrfs Not tainted 6.5.0-rc3+ #1 pstate: 6140000 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : btrfs_subpage_assert+0xbc/0xf0 lr : btrfs_subpage_assert+0xbc/0xf0 sp : ffff800093213720 x29: ffff800093213720 x28: ffff8000932138b4 x27: 000000000c280000 x26: 00000001b5d00000 x25: 000000000c281000 x24: 000000000c281fff x23: 0000000000001000 x22: 0000000000000000 x21: ffffff42b95bf880 x20: ffff42b9528e0000 x19: 0000000000001000 x18: ffffffffffffffff x17: 667274622f736620 x16: 6e69202c65746176 x15: 0000000000000028 x14: 0000000000000003 x13: 00000000002672d7 x12: 0000000000000000 x11: ffffcd3f0ccd9204 x10: ffffcd3f0554ae50 x9 : ffffcd3f0379528c x8 : ffff800093213428 x7 : 0000000000000000 x6 : ffffcd3f091771e8 x5 : ffff42b97f333948 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff42b9556cde80 x0 : 000000000000004f Call trace: btrfs_subpage_assert+0xbc/0xf0 btrfs_subpage_set_dirty+0x38/0xa0 btrfs_page_set_dirty+0x58/0x88 relocate_one_page+0x204/0x5f0 relocate_file_extent_cluster+0x11c/0x180 relocate_data_extent+0xd0/0xf8 relocate_block_group+0x3d0/0x4e8 btrfs_relocate_block_group+0x2d8/0x490 btrfs_relocate_chunk+0x54/0x1a8 btrfs_balance+0x7f4/0x1150 btrfs_ioctl+0x10f0/0x20b8 __arm64_sys_ioctl+0x120/0x11d8 invoke_syscall.constprop.0+0x80/0xd8 do_el0_svc+0x6c/0x158 el0_svc+0x50/0x1b0 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Code: 91098021 b0007fa0 91346000 97e9c6d2 (d4210000) This is the same problem outlined in 17b17fc ("btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand") , and the fix is the same. I originally looked for the same pattern elsewhere in our code, but mistakenly skipped over this code because I saw the page cache readahead before we set_page_extent_mapped, not realizing that this was only in the !page case, that we can still end up with a !uptodate page and then do the btrfs_read_folio further down. The fix here is the same as the above mentioned patch, move the set_page_extent_mapped call to after the btrfs_read_folio() block to make sure that we have the subpage blocksize stuff setup properly before using the page. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
commit f1187ef upstream. Fix a goof where KVM tries to grab source vCPUs from the destination VM when doing intrahost migration. Grabbing the wrong vCPU not only hoses the guest, it also crashes the host due to the VMSA pointer being left NULL. BUG: unable to handle page fault for address: ffffe38687000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 39 PID: 17143 Comm: sev_migrate_tes Tainted: GO 6.5.0-smp--fff2e47e6c3b-next torvalds#151 Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.28.0 07/10/2023 RIP: 0010:__free_pages+0x15/0xd0 RSP: 0018:ffff923fcf6e3c78 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffe38687000000 RCX: 0000000000000100 RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffe38687000000 RBP: ffff923fcf6e3c88 R08: ffff923fcafb0000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff83619b90 R12: ffff923fa9540000 R13: 0000000000080007 R14: ffff923f6d35d000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff929d0d7c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffe38687000000 CR3: 0000005224c34005 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> sev_free_vcpu+0xcb/0x110 [kvm_amd] svm_vcpu_free+0x75/0xf0 [kvm_amd] kvm_arch_vcpu_destroy+0x36/0x140 [kvm] kvm_destroy_vcpus+0x67/0x100 [kvm] kvm_arch_destroy_vm+0x161/0x1d0 [kvm] kvm_put_kvm+0x276/0x560 [kvm] kvm_vm_release+0x25/0x30 [kvm] __fput+0x106/0x280 ____fput+0x12/0x20 task_work_run+0x86/0xb0 do_exit+0x2e3/0x9c0 do_group_exit+0xb1/0xc0 __x64_sys_exit_group+0x1b/0x20 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> CR2: ffffe38687000000 Fixes: 6defa24 ("KVM: SEV: Init target VMCBs in sev_migrate_from") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Reviewed-by: Peter Gonda <pgonda@google.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://lore.kernel.org/r/20230825022357.2852133-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 2810c1e ] Inject fault while probing kunit-example-test.ko, if kstrdup() fails in mod_sysfs_setup() in load_module(), the mod->state will switch from MODULE_STATE_COMING to MODULE_STATE_GOING instead of from MODULE_STATE_LIVE to MODULE_STATE_GOING, so only kunit_module_exit() will be called without kunit_module_init(), and the mod->kunit_suites is no set correctly and the free in kunit_free_suite_set() will cause below wild-memory-access bug. The mod->state state machine when load_module() succeeds: MODULE_STATE_UNFORMED ---> MODULE_STATE_COMING ---> MODULE_STATE_LIVE ^ | | | delete_module +---------------- MODULE_STATE_GOING <---------+ The mod->state state machine when load_module() fails at mod_sysfs_setup(): MODULE_STATE_UNFORMED ---> MODULE_STATE_COMING ---> MODULE_STATE_GOING ^ | | | +-----------------------------------------------+ Call kunit_module_init() at MODULE_STATE_COMING state to fix the issue because MODULE_STATE_LIVE is transformed from it. Unable to handle kernel paging request at virtual address ffffff341e942a88 KASAN: maybe wild-memory-access in range [0x0003f9a0f4a15440-0x0003f9a0f4a15447] Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000441ea000 [ffffff341e942a88] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: kunit_example_test(-) cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: kunit_example_test] CPU: 3 PID: 2035 Comm: modprobe Tainted: G W N 6.5.0-next-20230828+ torvalds#136 Hardware name: linux,dummy-virt (DT) pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kfree+0x2c/0x70 lr : kunit_free_suite_set+0xcc/0x13c sp : ffff8000829b75b0 x29: ffff8000829b75b0 x28: ffff8000829b7b90 x27: 0000000000000000 x26: dfff800000000000 x25: ffffcd07c82a7280 x24: ffffcd07a50ab300 x23: ffffcd07a50ab2e8 x22: 1ffff00010536ec0 x21: dfff800000000000 x20: ffffcd07a50ab2f0 x19: ffffcd07a50ab2f0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffcd07c24b6764 x14: ffffcd07c24b63c0 x13: ffffcd07c4cebb94 x12: ffff700010536ec7 x11: 1ffff00010536ec6 x10: ffff700010536ec6 x9 : dfff800000000000 x8 : 00008fffefac913a x7 : 0000000041b58ab3 x6 : 0000000000000000 x5 : 1ffff00010536ec5 x4 : ffff8000829b7628 x3 : dfff800000000000 x2 : ffffff341e942a80 x1 : ffffcd07a50aa000 x0 : fffffc0000000000 Call trace: kfree+0x2c/0x70 kunit_free_suite_set+0xcc/0x13c kunit_module_notify+0xd8/0x360 blocking_notifier_call_chain+0xc4/0x128 load_module+0x382c/0x44a4 init_module_from_file+0xd4/0x128 idempotent_init_module+0x2c8/0x524 __arm64_sys_finit_module+0xac/0x100 invoke_syscall+0x6c/0x258 el0_svc_common.constprop.0+0x160/0x22c do_el0_svc+0x44/0x5c el0_svc+0x38/0x78 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x194 Code: aa0003e1 b25657e0 d34cfc42 8b021802 (f9400440) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: 0x4d0742200000 from 0xffff800080000000 PHYS_OFFSET: 0xffffee43c0000000 CPU features: 0x88000203,3c020000,1000421b Memory Limit: none Rebooting in 1 seconds.. Fixes: 3d6e446 ("kunit: unify module and builtin suite definitions") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
…n smcr_port_add [ Upstream commit f5146e3 ] While doing smcr_port_add, there maybe linkgroup add into or delete from smc_lgr_list.list at the same time, which may result kernel crash. So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add. The crash calltrace show below: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014 Workqueue: events smc_ib_port_event_work [smc] RIP: 0010:smcr_port_add+0xa6/0xf0 [smc] RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297 RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918 R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4 R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08 FS: 0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0 PKRU: 55555554 Call Trace: smc_ib_port_event_work+0x18f/0x380 [smc] process_one_work+0x19b/0x340 worker_thread+0x30/0x370 ? process_one_work+0x340/0x340 kthread+0x114/0x130 ? __kthread_cancel_work+0x50/0x50 ret_from_fork+0x1f/0x30 Fixes: 1f90a05 ("net/smc: add smcr_port_add() and smcr_link_up() processing") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 403f0e7 ] macb_set_tx_clk() is called under a spinlock but itself calls clk_set_rate() which can sleep. This results in: | BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 | pps pps1: new PPS source ptp1 | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 40, name: kworker/u4:3 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | 4 locks held by kworker/u4:3/40: | #0: ffff000003409148 | macb ff0c000.ethernet: gem-ptp-timer ptp clock registered. | ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c | #1: ffff8000833cbdd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c | #2: ffff000004f01578 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x44/0x4e8 | #3: ffff000004f06f50 (&bp->lock){....}-{3:3}, at: macb_mac_link_up+0x40/0x2ac | irq event stamp: 113998 | hardirqs last enabled at (113997): [<ffff800080e8503c>] _raw_spin_unlock_irq+0x30/0x64 | hardirqs last disabled at (113998): [<ffff800080e84478>] _raw_spin_lock_irqsave+0xac/0xc8 | softirqs last enabled at (113608): [<ffff800080010630>] __do_softirq+0x430/0x4e4 | softirqs last disabled at (113597): [<ffff80008001614c>] ____do_softirq+0x10/0x1c | CPU: 0 PID: 40 Comm: kworker/u4:3 Not tainted 6.5.0-11717-g9355ce8b2f50-dirty torvalds#368 | Hardware name: ... ZynqMP ... (DT) | Workqueue: events_power_efficient phylink_resolve | Call trace: | dump_backtrace+0x98/0xf0 | show_stack+0x18/0x24 | dump_stack_lvl+0x60/0xac | dump_stack+0x18/0x24 | __might_resched+0x144/0x24c | __might_sleep+0x48/0x98 | __mutex_lock+0x58/0x7b0 | mutex_lock_nested+0x24/0x30 | clk_prepare_lock+0x4c/0xa8 | clk_set_rate+0x24/0x8c | macb_mac_link_up+0x25c/0x2ac | phylink_resolve+0x178/0x4e8 | process_one_work+0x1ec/0x51c | worker_thread+0x1ec/0x3e4 | kthread+0x120/0x124 | ret_from_fork+0x10/0x20 The obvious fix is to move the call to macb_set_tx_clk() out of the protected area. This seems safe as rx and tx are both disabled anyway at this point. It is however not entirely clear what the spinlock shall protect. It could be the read-modify-write access to the NCFGR register, but this is accessed in macb_set_rx_mode() and macb_set_rxcsum_feature() as well without holding the spinlock. It could also be the register accesses done in mog_init_rings() or macb_init_buffers(), but again these functions are called without holding the spinlock in macb_hresp_error_task(). The locking seems fishy in this driver and it might deserve another look before this patch is applied. Fixes: 633e98a ("net: macb: use resolved link config in mac_link_up()") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/r/20230908112913.1701766-1-s.hauer@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 910e230 ] Macro symbol_put() is defined as __symbol_put(__stringify(x)) ksym_name = "jiffies" symbol_put(ksym_name) will be resolved as __symbol_put("ksym_name") which is clearly wrong. So symbol_put must be replaced with __symbol_put. When we uninstall hw_breakpoint.ko (rmmod), a kernel bug occurs with the following error: [11381.854152] kernel BUG at kernel/module/main.c:779! [11381.854159] invalid opcode: 0000 [#2] PREEMPT SMP PTI [11381.854163] CPU: 8 PID: 59623 Comm: rmmod Tainted: G D OE 6.2.9-200.fc37.x86_64 #1 [11381.854167] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B360M-HDV, BIOS P3.20 10/23/2018 [11381.854169] RIP: 0010:__symbol_put+0xa2/0xb0 [11381.854175] Code: 00 e8 92 d2 f7 ff 65 8b 05 c3 2f e6 78 85 c0 74 1b 48 8b 44 24 30 65 48 2b 04 25 28 00 00 00 75 12 48 83 c4 38 c3 cc cc cc cc <0f> 0b 0f 1f 44 00 00 eb de e8 c0 df d8 00 90 90 90 90 90 90 90 90 [11381.854178] RSP: 0018:ffffad8ec6ae7dd0 EFLAGS: 00010246 [11381.854181] RAX: 0000000000000000 RBX: ffffffffc1fd1240 RCX: 000000000000000c [11381.854184] RDX: 000000000000006b RSI: ffffffffc02bf7c7 RDI: ffffffffc1fd001c [11381.854186] RBP: 000055a38b76e7c8 R08: ffffffff871ccfe0 R09: 0000000000000000 [11381.854188] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [11381.854190] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [11381.854192] FS: 00007fbf7c62c740(0000) GS:ffff8c5badc00000(0000) knlGS:0000000000000000 [11381.854195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11381.854197] CR2: 000055a38b7793f8 CR3: 0000000363e1e001 CR4: 00000000003726e0 [11381.854200] DR0: ffffffffb3407980 DR1: 0000000000000000 DR2: 0000000000000000 [11381.854202] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [11381.854204] Call Trace: [11381.854207] <TASK> [11381.854212] s_module_exit+0xc/0xff0 [symbol_getput] [11381.854219] __do_sys_delete_module.constprop.0+0x198/0x2f0 [11381.854225] do_syscall_64+0x58/0x80 [11381.854231] ? exit_to_user_mode_prepare+0x180/0x1f0 [11381.854237] ? syscall_exit_to_user_mode+0x17/0x40 [11381.854241] ? do_syscall_64+0x67/0x80 [11381.854245] ? syscall_exit_to_user_mode+0x17/0x40 [11381.854248] ? do_syscall_64+0x67/0x80 [11381.854252] ? exc_page_fault+0x70/0x170 [11381.854256] entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Rong Tao <rongtao@cestc.cn> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit b97f96e ] When compiling the kernel with clang and having HARDENED_USERCOPY enabled, the liburing openat2.t test case fails during request setup: usercopy: Kernel memory overwrite attempt detected to SLUB object 'io_kiocb' (offset 24, size 24)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 413 Comm: openat2.t Tainted: G N 6.4.3-g6995e2de6891-dirty torvalds#19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 RIP: 0010:usercopy_abort+0x84/0x90 Code: ce 49 89 ce 48 c7 c3 68 48 98 82 48 0f 44 de 48 c7 c7 56 c6 94 82 4c 89 de 48 89 c1 41 52 41 56 53 e8 e0 51 c5 00 48 83 c4 18 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 41 57 41 56 RSP: 0018:ffffc900016b3da0 EFLAGS: 00010296 RAX: 0000000000000062 RBX: ffffffff82984868 RCX: 4e9b661ac6275b00 RDX: ffff8881b90ec580 RSI: ffffffff82949a64 RDI: 00000000ffffffff RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc900016b3c88 R11: ffffc900016b3c30 R12: 00007ffe549659e0 R13: ffff888119014000 R14: 0000000000000018 R15: 0000000000000018 FS: 00007f862e3ca680(0000) GS:ffff8881b90c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005571483542a8 CR3: 0000000118c11000 CR4: 00000000003506e0 Call Trace: <TASK> ? __die_body+0x63/0xb0 ? die+0x9d/0xc0 ? do_trap+0xa7/0x180 ? usercopy_abort+0x84/0x90 ? do_error_trap+0xc6/0x110 ? usercopy_abort+0x84/0x90 ? handle_invalid_op+0x2c/0x40 ? usercopy_abort+0x84/0x90 ? exc_invalid_op+0x2f/0x40 ? asm_exc_invalid_op+0x16/0x20 ? usercopy_abort+0x84/0x90 __check_heap_object+0xe2/0x110 __check_object_size+0x142/0x3d0 io_openat2_prep+0x68/0x140 io_submit_sqes+0x28a/0x680 __se_sys_io_uring_enter+0x120/0x580 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x55714834de26 Code: ca 01 0f b6 82 d0 00 00 00 8b ba cc 00 00 00 45 31 c0 31 d2 41 b9 08 00 00 00 83 e0 01 c1 e0 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> 66 0f 1f 84 00 00 00 00 00 89 30 eb 89 0f 1f 40 00 8b 00 a8 06 RSP: 002b:00007ffe549659c8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00007ffe54965a50 RCX: 000055714834de26 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000000 R11: 0000000000000246 R12: 000055714834f057 R13: 00007ffe54965a50 R14: 0000000000000001 R15: 0000557148351dd8 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- when it tries to copy struct open_how from userspace into the per-command space in the io_kiocb. There's nothing wrong with the copy, but we're missing the appropriate annotations for allowing user copies to/from the io_kiocb slab. Allow copies in the per-command area, which is from the 'file' pointer to when 'opcode' starts. We do have existing user copies there, but they are not all annotated like the one that openat2_prep() uses, copy_struct_from_user(). But in practice opcodes should be allowed to copy data into their per-command area in the io_kiocb. Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit 2319b9c ] The device may be scheduled during the resume process, so this cannot appear in atomic operations. Since pm_runtime_set_active will resume suppliers, put set active outside the spin lock, which is only used to protect the struct cdns data structure, otherwise the kernel will report the following warning: BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1163 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 651, name: sh preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 CPU: 0 PID: 651 Comm: sh Tainted: G WC 6.1.20 #1 Hardware name: Freescale i.MX8QM MEK (DT) Call trace: dump_backtrace.part.0+0xe0/0xf0 show_stack+0x18/0x30 dump_stack_lvl+0x64/0x80 dump_stack+0x1c/0x38 __might_resched+0x1fc/0x240 __might_sleep+0x68/0xc0 __pm_runtime_resume+0x9c/0xe0 rpm_get_suppliers+0x68/0x1b0 __pm_runtime_set_status+0x298/0x560 cdns_resume+0xb0/0x1c0 cdns3_controller_resume.isra.0+0x1e0/0x250 cdns3_plat_resume+0x28/0x40 Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Acked-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20230616021952.1025854-1-xiaolei.wang@windriver.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit af42269 ] For cases where icc_bw_set() can be called in callbaths that could deadlock against shrinker/reclaim, such as runpm resume, we need to decouple the icc locking. Introduce a new icc_bw_lock for cases where we need to serialize bw aggregation and update to decouple that from paths that require memory allocation such as node/link creation/ destruction. Fixes this lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc8-debug+ torvalds#554 Not tainted ------------------------------------------------------ ring0/132 is trying to acquire lock: ffffff80871916d0 (&gmu->lock){+.+.}-{3:3}, at: a6xx_pm_resume+0xf0/0x234 but task is already holding lock: ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (dma_fence_map){++++}-{0:0}: __dma_fence_might_wait+0x74/0xc0 dma_resv_lockdep+0x1f4/0x2f4 do_one_initcall+0x104/0x2bc kernel_init_freeable+0x344/0x34c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x80/0xa8 slab_pre_alloc_hook.constprop.0+0x40/0x25c __kmem_cache_alloc_node+0x60/0x1cc __kmalloc+0xd8/0x100 topology_parse_cpu_capacity+0x8c/0x178 get_cpu_for_node+0x88/0xc4 parse_cluster+0x1b0/0x28c parse_cluster+0x8c/0x28c init_cpu_topology+0x168/0x188 smp_prepare_cpus+0x24/0xf8 kernel_init_freeable+0x18c/0x34c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #2 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire+0x3c/0x48 fs_reclaim_acquire+0x54/0xa8 slab_pre_alloc_hook.constprop.0+0x40/0x25c __kmem_cache_alloc_node+0x60/0x1cc __kmalloc+0xd8/0x100 kzalloc.constprop.0+0x14/0x20 icc_node_create_nolock+0x4c/0xc4 icc_node_create+0x38/0x58 qcom_icc_rpmh_probe+0x1b8/0x248 platform_probe+0x70/0xc4 really_probe+0x158/0x290 __driver_probe_device+0xc8/0xe0 driver_probe_device+0x44/0x100 __driver_attach+0xf8/0x108 bus_for_each_dev+0x78/0xc4 driver_attach+0x2c/0x38 bus_add_driver+0xd0/0x1d8 driver_register+0xbc/0xf8 __platform_driver_register+0x30/0x3c qnoc_driver_init+0x24/0x30 do_one_initcall+0x104/0x2bc kernel_init_freeable+0x344/0x34c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #1 (icc_lock){+.+.}-{3:3}: __mutex_lock+0xcc/0x3c8 mutex_lock_nested+0x30/0x44 icc_set_bw+0x88/0x2b4 _set_opp_bw+0x8c/0xd8 _set_opp+0x19c/0x300 dev_pm_opp_set_opp+0x84/0x94 a6xx_gmu_resume+0x18c/0x804 a6xx_pm_resume+0xf8/0x234 adreno_runtime_resume+0x2c/0x38 pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x15c/0x174 rpm_callback+0x78/0x7c rpm_resume+0x318/0x524 __pm_runtime_resume+0x78/0xbc adreno_load_gpu+0xc4/0x17c msm_open+0x50/0x120 drm_file_alloc+0x17c/0x228 drm_open_helper+0x74/0x118 drm_open+0xa0/0x144 drm_stub_open+0xd4/0xe4 chrdev_open+0x1b8/0x1e4 do_dentry_open+0x2f8/0x38c vfs_open+0x34/0x40 path_openat+0x64c/0x7b4 do_filp_open+0x54/0xc4 do_sys_openat2+0x9c/0x100 do_sys_open+0x50/0x7c __arm64_sys_openat+0x28/0x34 invoke_syscall+0x8c/0x128 el0_svc_common.constprop.0+0xa0/0x11c do_el0_svc+0xac/0xbc el0_svc+0x48/0xa0 el0t_64_sync_handler+0xac/0x13c el0t_64_sync+0x190/0x194 -> #0 (&gmu->lock){+.+.}-{3:3}: __lock_acquire+0xe00/0x1060 lock_acquire+0x1e0/0x2f8 __mutex_lock+0xcc/0x3c8 mutex_lock_nested+0x30/0x44 a6xx_pm_resume+0xf0/0x234 adreno_runtime_resume+0x2c/0x38 pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x15c/0x174 rpm_callback+0x78/0x7c rpm_resume+0x318/0x524 __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 msm_gpu_submit+0x58/0x178 msm_job_run+0x78/0x150 drm_sched_main+0x290/0x370 kthread+0xf0/0x100 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: &gmu->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(dma_fence_map); lock(mmu_notifier_invalidate_range_start); lock(dma_fence_map); lock(&gmu->lock); *** DEADLOCK *** 2 locks held by ring0/132: #0: ffffff8087191170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150 #1: ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150 stack backtrace: CPU: 7 PID: 132 Comm: ring0 Not tainted 6.2.0-rc8-debug+ torvalds#554 Hardware name: Google Lazor (rev1 - 2) with LTE (DT) Call trace: dump_backtrace.part.0+0xb4/0xf8 show_stack+0x20/0x38 dump_stack_lvl+0x9c/0xd0 dump_stack+0x18/0x34 print_circular_bug+0x1b4/0x1f0 check_noncircular+0x78/0xac __lock_acquire+0xe00/0x1060 lock_acquire+0x1e0/0x2f8 __mutex_lock+0xcc/0x3c8 mutex_lock_nested+0x30/0x44 a6xx_pm_resume+0xf0/0x234 adreno_runtime_resume+0x2c/0x38 pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x15c/0x174 rpm_callback+0x78/0x7c rpm_resume+0x318/0x524 __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 msm_gpu_submit+0x58/0x178 msm_job_run+0x78/0x150 drm_sched_main+0x290/0x370 kthread+0xf0/0x100 ret_from_fork+0x10/0x20 Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20230807171148.210181-7-robdclark@gmail.com Signed-off-by: Georgi Djakov <djakov@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
[ Upstream commit bc056e7 ] When we calculate the end position of ext4_free_extent, this position may be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not the first case of adjusting the best extent, that is, new_bex_end > 0, the following BUG_ON will be triggered: ========================================================= kernel BUG at fs/ext4/mballoc.c:5116! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ torvalds#279 RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 Call Trace: <TASK> ext4_mb_use_best_found+0x203/0x2f0 ext4_mb_try_best_found+0x163/0x240 ext4_mb_regular_allocator+0x158/0x1550 ext4_mb_new_blocks+0x86a/0xe10 ext4_ext_map_blocks+0xb0c/0x13a0 ext4_map_blocks+0x2cd/0x8f0 ext4_iomap_begin+0x27b/0x400 iomap_iter+0x222/0x3d0 __iomap_dio_rw+0x243/0xcb0 iomap_dio_rw+0x16/0x80 ========================================================= A simple reproducer demonstrating the problem: mkfs.ext4 -F /dev/sda -b 4096 100M mount /dev/sda /tmp/test fallocate -l1M /tmp/test/tmp fallocate -l10M /tmp/test/file fallocate -i -o 1M -l16777203M /tmp/test/file fsstress -d /tmp/test -l 0 -n 100000 -p 8 & sleep 10 && killall -9 fsstress rm -f /tmp/test/tmp xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" We simply refactor the logic for adjusting the best extent by adding a temporary ext4_free_extent ex and use extent_logical_end() to avoid overflow, which also simplifies the code. Cc: stable@kernel.org # 6.4 Fixes: 93cdf49 ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337
pushed a commit
that referenced
this pull request
Sep 23, 2023
… tree" commit 3c70de9 upstream. This reverts commit 06f4543. John Ogness reports the case that the allocation is in atomic context under acquired spin-lock. [ 12.555784] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306 [ 12.555808] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 70, name: kworker/1:2 [ 12.555814] preempt_count: 1, expected: 0 [ 12.555820] INFO: lockdep is turned off. [ 12.555824] irq event stamp: 208 [ 12.555828] hardirqs last enabled at (207): [<c00000000111e414>] ._raw_spin_unlock_irq+0x44/0x80 [ 12.555850] hardirqs last disabled at (208): [<c00000000110ff94>] .__schedule+0x854/0xfe0 [ 12.555859] softirqs last enabled at (188): [<c000000000f73504>] .addrconf_verify_rtnl+0x2c4/0xb70 [ 12.555872] softirqs last disabled at (182): [<c000000000f732b0>] .addrconf_verify_rtnl+0x70/0xb70 [ 12.555884] CPU: 1 PID: 70 Comm: kworker/1:2 Tainted: G S 6.6.0-rc1 #1 [ 12.555893] Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac [ 12.555898] Workqueue: firewire_ohci .bus_reset_work [firewire_ohci] [ 12.555939] Call Trace: [ 12.555944] [c000000009677830] [c0000000010d83c0] .dump_stack_lvl+0x8c/0xd0 (unreliable) [ 12.555963] [c0000000096778b0] [c000000000140270] .__might_resched+0x320/0x340 [ 12.555978] [c000000009677940] [c000000000497600] .__kmem_cache_alloc_node+0x390/0x460 [ 12.555993] [c000000009677a10] [c0000000003fe620] .__kmalloc+0x70/0x310 [ 12.556007] [c000000009677ac0] [c0003d00004e2268] .fw_core_handle_bus_reset+0x2c8/0xba0 [firewire_core] [ 12.556060] [c000000009677c20] [c0003d0000491190] .bus_reset_work+0x330/0x9b0 [firewire_ohci] [ 12.556079] [c000000009677d10] [c00000000011d0d0] .process_one_work+0x280/0x6f0 [ 12.556094] [c000000009677e10] [c00000000011d8a0] .worker_thread+0x360/0x500 [ 12.556107] [c000000009677ef0] [c00000000012e3b4] .kthread+0x154/0x160 [ 12.556120] [c000000009677f90] [c00000000000bfa8] .start_kernel_thread+0x10/0x14 Cc: stable@kernel.org Reported-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/lkml/87jzsuv1xk.fsf@jogness.linutronix.de/raw Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ptr1337
pushed a commit
that referenced
this pull request
Nov 24, 2025
commit 00fbff7 upstream. When crashkernel is configured with a high reservation, shrinking its value below the low crashkernel reservation causes two issues: 1. Invalid crashkernel resource objects 2. Kernel crash if crashkernel shrinking is done twice For example, with crashkernel=200M,high, the kernel reserves 200MB of high memory and some default low memory (say 256MB). The reservation appears as: cat /proc/iomem | grep -i crash af000000-beffffff : Crash kernel 433000000-43f7fffff : Crash kernel If crashkernel is then shrunk to 50MB (echo 52428800 > /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: af000000-beffffff : Crash kernel Instead, it should show 50MB: af000000-b21fffff : Crash kernel Further shrinking crashkernel to 40MB causes a kernel crash with the following trace (x86): BUG: kernel NULL pointer dereference, address: 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI <snip...> Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? __release_resource+0xd/0xb0 release_resource+0x26/0x40 __crash_shrink_memory+0xe5/0x110 crash_shrink_memory+0x12a/0x190 kexec_crash_size_store+0x41/0x80 kernfs_fop_write_iter+0x141/0x1f0 vfs_write+0x294/0x460 ksys_write+0x6d/0xf0 <snip...> This happens because __crash_shrink_memory()/kernel/crash_core.c incorrectly updates the crashk_res resource object even when crashk_low_res should be updated. Fix this by ensuring the correct crashkernel resource object is updated when shrinking crashkernel memory. Link: https://lkml.kernel.org/r/20251101193741.289252-1-sourabhjain@linux.ibm.com Fixes: 16c6006 ("kexec: enable kexec_crash_size to support two crash kernel regions") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Nov 27, 2025
This reverts commit d02c2e4. Mauro reports that this breaks APEI notifications on his QEMU setup because the "reserved for firmware" region still needs to be writable by Linux in order to signal _back_ to the firmware after processing the reported error: | {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 | ... | [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error | Unable to handle kernel write to read-only memory at virtual address ffff800080035018 | Mem abort info: | ESR = 0x000000009600004f | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x0f: level 3 permission fault | Data abort info: | ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000 | CM = 0, WnR = 1, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000 | pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483 | Internal error: Oops: 000000009600004f [CachyOS#1] SMP For now, revert the offending commit. We can presumably switch back to PAGE_KERNEL when bringing this back in the future. Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Nov 28, 2025
Since commit 30f241f ("xsk: Fix immature cq descriptor production"), the descriptor number is stored in skb control block and xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto pool's completion queue. skb control block shouldn't be used for this purpose as after transmit xsk doesn't have control over it and other subsystems could use it. This leads to the following kernel panic due to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [CachyOS#1] SMP NOPTI CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 CachyOS#1 PREEMPT(lazy) Debian 6.16.12-1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:xsk_destruct_skb+0xd0/0x180 [...] Call Trace: <IRQ> ? napi_complete_done+0x7a/0x1a0 ip_rcv_core+0x1bb/0x340 ip_rcv+0x30/0x1f0 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0x87/0x130 __napi_poll+0x28/0x180 net_rx_action+0x339/0x420 handle_softirqs+0xdc/0x320 ? handle_edge_irq+0x90/0x1e0 do_softirq.part.0+0x3b/0x60 </IRQ> <TASK> __local_bh_enable_ip+0x60/0x70 __dev_direct_xmit+0x14e/0x1f0 __xsk_generic_xmit+0x482/0xb70 ? __remove_hrtimer+0x41/0xa0 ? __xsk_generic_xmit+0x51/0xb70 ? _raw_spin_unlock_irqrestore+0xe/0x40 xsk_sendmsg+0xda/0x1c0 __sys_sendto+0x1ee/0x200 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x84/0x2f0 ? __pfx_pollwake+0x10/0x10 ? __rseq_handle_notify_resume+0xad/0x4c0 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> [...] Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Instead use the skb destructor_arg pointer along with pointer tagging. As pointers are always aligned to 8B, use the bottom bit to indicate whether this a single address or an allocated struct containing several addresses. Fixes: 30f241f ("xsk: Fix immature cq descriptor production") Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Nov 28, 2025
…ptcp_do_fastclose(). syzbot reported divide-by-zero in __tcp_select_window() by MPTCP socket. [0] We had a similar issue for the bare TCP and fixed in commit 499350a ("tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0"). Let's apply the same fix to mptcp_do_fastclose(). [0]: Oops: divide error: 0000 [CachyOS#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336 Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00 RSP: 0018:ffffc90003017640 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028 R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0 Call Trace: <TASK> tcp_select_window net/ipv4/tcp_output.c:281 [inline] __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline] tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0xe5/0x270 net/socket.c:742 __sys_sendto+0x3bd/0x520 net/socket.c:2244 __do_sys_sendto net/socket.c:2251 [inline] __se_sys_sendto net/socket.c:2247 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2247 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f66e998f749 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006 </TASK> Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose") Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Nov 28, 2025
The crash in process_v2_sparse_read() for fscrypt-encrypted directories has been reported. Issue takes place for Ceph msgr2 protocol in secure mode. It can be reproduced by the steps: sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure (1) mkdir /mnt/cephfs/fscrypt-test-3 (2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3 (3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3 (4) fscrypt lock /mnt/cephfs/fscrypt-test-3 (5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3 (6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar (7) Issue has been triggered [ 408.072247] ------------[ cut here ]------------ [ 408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865 ceph_con_v2_try_read+0x4b39/0x72f0 [ 408.072267] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+ [ 408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 408.072310] Workqueue: ceph-msgr ceph_con_workfn [ 408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0 [ 408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8 8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06 fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85 [ 408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246 [ 408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38 [ 408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8 [ 408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8 [ 408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000 [ 408.072329] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.072331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.072336] PKRU: 55555554 [ 408.072337] Call Trace: [ 408.072338] <TASK> [ 408.072340] ? sched_clock_noinstr+0x9/0x10 [ 408.072344] ? __pfx_ceph_con_v2_try_read+0x10/0x10 [ 408.072347] ? _raw_spin_unlock+0xe/0x40 [ 408.072349] ? finish_task_switch.isra.0+0x15d/0x830 [ 408.072353] ? __kasan_check_write+0x14/0x30 [ 408.072357] ? mutex_lock+0x84/0xe0 [ 408.072359] ? __pfx_mutex_lock+0x10/0x10 [ 408.072361] ceph_con_workfn+0x27e/0x10e0 [ 408.072364] ? metric_delayed_work+0x311/0x2c50 [ 408.072367] process_one_work+0x611/0xe20 [ 408.072371] ? __kasan_check_write+0x14/0x30 [ 408.072373] worker_thread+0x7e3/0x1580 [ 408.072375] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 408.072378] ? __pfx_worker_thread+0x10/0x10 [ 408.072381] kthread+0x381/0x7a0 [ 408.072383] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 408.072385] ? __pfx_kthread+0x10/0x10 [ 408.072387] ? __kasan_check_write+0x14/0x30 [ 408.072389] ? recalc_sigpending+0x160/0x220 [ 408.072392] ? _raw_spin_unlock_irq+0xe/0x50 [ 408.072394] ? calculate_sigpending+0x78/0xb0 [ 408.072395] ? __pfx_kthread+0x10/0x10 [ 408.072397] ret_from_fork+0x2b6/0x380 [ 408.072400] ? __pfx_kthread+0x10/0x10 [ 408.072402] ret_from_fork_asm+0x1a/0x30 [ 408.072406] </TASK> [ 408.072407] ---[ end trace 0000000000000000 ]--- [ 408.072418] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [CachyOS#1] SMP KASAN NOPTI [ 408.072984] KASAN: null-ptr-deref in range [0x0000000000000000- 0x0000000000000007] [ 408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G W 6.17.0-rc7+ CachyOS#1 PREEMPT(voluntary) [ 408.073886] Tainted: [W]=WARN [ 408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 408.074468] Workqueue: ceph-msgr ceph_con_workfn [ 408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80 [ 408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03 [ 408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246 [ 408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000 [ 408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378 [ 408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8 [ 408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71 [ 408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378 [ 408.079159] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.079736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.080376] PKRU: 55555554 [ 408.080513] Call Trace: [ 408.080630] <TASK> [ 408.080729] ceph_con_v2_try_read+0x49b9/0x72f0 [ 408.081115] ? __pfx_ceph_con_v2_try_read+0x10/0x10 [ 408.081348] ? _raw_spin_unlock+0xe/0x40 [ 408.081538] ? finish_task_switch.isra.0+0x15d/0x830 [ 408.081768] ? __kasan_check_write+0x14/0x30 [ 408.081986] ? mutex_lock+0x84/0xe0 [ 408.082160] ? __pfx_mutex_lock+0x10/0x10 [ 408.082343] ceph_con_workfn+0x27e/0x10e0 [ 408.082529] ? metric_delayed_work+0x311/0x2c50 [ 408.082737] process_one_work+0x611/0xe20 [ 408.082948] ? __kasan_check_write+0x14/0x30 [ 408.083156] worker_thread+0x7e3/0x1580 [ 408.083331] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 408.083557] ? __pfx_worker_thread+0x10/0x10 [ 408.083751] kthread+0x381/0x7a0 [ 408.083922] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 408.084139] ? __pfx_kthread+0x10/0x10 [ 408.084310] ? __kasan_check_write+0x14/0x30 [ 408.084510] ? recalc_sigpending+0x160/0x220 [ 408.084708] ? _raw_spin_unlock_irq+0xe/0x50 [ 408.084917] ? calculate_sigpending+0x78/0xb0 [ 408.085138] ? __pfx_kthread+0x10/0x10 [ 408.085335] ret_from_fork+0x2b6/0x380 [ 408.085525] ? __pfx_kthread+0x10/0x10 [ 408.085720] ret_from_fork_asm+0x1a/0x30 [ 408.085922] </TASK> [ 408.086036] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 408.087778] ---[ end trace 0000000000000000 ]--- [ 408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80 [ 408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03 [ 408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246 [ 408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000 [ 408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378 [ 408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8 [ 408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71 [ 408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378 [ 408.091035] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.091452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.092530] PKRU: 55555554 [ 417.112915] ================================================================== [ 417.113491] BUG: KASAN: slab-use-after-free in __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951 [ 417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G D W 6.17.0-rc7+ CachyOS#1 PREEMPT(voluntary) [ 417.114592] Tainted: [D]=DIE, [W]=WARN [ 417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 417.114596] Workqueue: events handle_timeout [ 417.114601] Call Trace: [ 417.114602] <TASK> [ 417.114604] dump_stack_lvl+0x5c/0x90 [ 417.114610] print_report+0x171/0x4dc [ 417.114613] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 417.114617] ? kasan_complete_mode_report_info+0x80/0x220 [ 417.114621] kasan_report+0xbd/0x100 [ 417.114625] ? __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114628] ? __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114630] __asan_report_load4_noabort+0x14/0x30 [ 417.114633] __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114635] ? queue_con_delay+0x8d/0x200 [ 417.114638] ? __pfx___mutex_lock.constprop.0+0x10/0x10 [ 417.114641] ? __send_subscribe+0x529/0xb20 [ 417.114644] __mutex_lock_slowpath+0x13/0x20 [ 417.114646] mutex_lock+0xd4/0xe0 [ 417.114649] ? __pfx_mutex_lock+0x10/0x10 [ 417.114652] ? ceph_monc_renew_subs+0x2a/0x40 [ 417.114654] ceph_con_keepalive+0x22/0x110 [ 417.114656] handle_timeout+0x6b3/0x11d0 [ 417.114659] ? _raw_spin_unlock_irq+0xe/0x50 [ 417.114662] ? __pfx_handle_timeout+0x10/0x10 [ 417.114664] ? queue_delayed_work_on+0x8e/0xa0 [ 417.114669] process_one_work+0x611/0xe20 [ 417.114672] ? __kasan_check_write+0x14/0x30 [ 417.114676] worker_thread+0x7e3/0x1580 [ 417.114678] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 417.114682] ? __pfx_sched_setscheduler_nocheck+0x10/0x10 [ 417.114687] ? __pfx_worker_thread+0x10/0x10 [ 417.114689] kthread+0x381/0x7a0 [ 417.114692] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 417.114694] ? __pfx_kthread+0x10/0x10 [ 417.114697] ? __kasan_check_write+0x14/0x30 [ 417.114699] ? recalc_sigpending+0x160/0x220 [ 417.114703] ? _raw_spin_unlock_irq+0xe/0x50 [ 417.114705] ? calculate_sigpending+0x78/0xb0 [ 417.114707] ? __pfx_kthread+0x10/0x10 [ 417.114710] ret_from_fork+0x2b6/0x380 [ 417.114713] ? __pfx_kthread+0x10/0x10 [ 417.114715] ret_from_fork_asm+0x1a/0x30 [ 417.114720] </TASK> [ 417.125171] Allocated by task 2: [ 417.125333] kasan_save_stack+0x26/0x60 [ 417.125522] kasan_save_track+0x14/0x40 [ 417.125742] kasan_save_alloc_info+0x39/0x60 [ 417.125945] __kasan_slab_alloc+0x8b/0xb0 [ 417.126133] kmem_cache_alloc_node_noprof+0x13b/0x460 [ 417.126381] copy_process+0x320/0x6250 [ 417.126595] kernel_clone+0xb7/0x840 [ 417.126792] kernel_thread+0xd6/0x120 [ 417.126995] kthreadd+0x85c/0xbe0 [ 417.127176] ret_from_fork+0x2b6/0x380 [ 417.127378] ret_from_fork_asm+0x1a/0x30 [ 417.127692] Freed by task 0: [ 417.127851] kasan_save_stack+0x26/0x60 [ 417.128057] kasan_save_track+0x14/0x40 [ 417.128267] kasan_save_free_info+0x3b/0x60 [ 417.128491] __kasan_slab_free+0x6c/0xa0 [ 417.128708] kmem_cache_free+0x182/0x550 [ 417.128906] free_task+0xeb/0x140 [ 417.129070] __put_task_struct+0x1d2/0x4f0 [ 417.129259] __put_task_struct_rcu_cb+0x15/0x20 [ 417.129480] rcu_do_batch+0x3d3/0xe70 [ 417.129681] rcu_core+0x549/0xb30 [ 417.129839] rcu_core_si+0xe/0x20 [ 417.130005] handle_softirqs+0x160/0x570 [ 417.130190] __irq_exit_rcu+0x189/0x1e0 [ 417.130369] irq_exit_rcu+0xe/0x20 [ 417.130531] sysvec_apic_timer_interrupt+0x9f/0xd0 [ 417.130768] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 417.131082] Last potentially related work creation: [ 417.131305] kasan_save_stack+0x26/0x60 [ 417.131484] kasan_record_aux_stack+0xae/0xd0 [ 417.131695] __call_rcu_common+0xcd/0x14b0 [ 417.131909] call_rcu+0x31/0x50 [ 417.132071] delayed_put_task_struct+0x128/0x190 [ 417.132295] rcu_do_batch+0x3d3/0xe70 [ 417.132478] rcu_core+0x549/0xb30 [ 417.132658] rcu_core_si+0xe/0x20 [ 417.132808] handle_softirqs+0x160/0x570 [ 417.132993] __irq_exit_rcu+0x189/0x1e0 [ 417.133181] irq_exit_rcu+0xe/0x20 [ 417.133353] sysvec_apic_timer_interrupt+0x9f/0xd0 [ 417.133584] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 417.133921] Second to last potentially related work creation: [ 417.134183] kasan_save_stack+0x26/0x60 [ 417.134362] kasan_record_aux_stack+0xae/0xd0 [ 417.134566] __call_rcu_common+0xcd/0x14b0 [ 417.134782] call_rcu+0x31/0x50 [ 417.134929] put_task_struct_rcu_user+0x58/0xb0 [ 417.135143] finish_task_switch.isra.0+0x5d3/0x830 [ 417.135366] __schedule+0xd30/0x5100 [ 417.135534] schedule_idle+0x5a/0x90 [ 417.135712] do_idle+0x25f/0x410 [ 417.135871] cpu_startup_entry+0x53/0x70 [ 417.136053] start_secondary+0x216/0x2c0 [ 417.136233] common_startup_64+0x13e/0x141 [ 417.136894] The buggy address belongs to the object at ffff888124870000 which belongs to the cache task_struct of size 10504 [ 417.138122] The buggy address is located 52 bytes inside of freed 10504-byte region [ffff888124870000, ffff888124872908) [ 417.139465] The buggy address belongs to the physical page: [ 417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x124870 [ 417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 417.141519] memcg:ffff88811aa20e01 [ 417.141874] anon flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ 417.142600] page_type: f5(slab) [ 417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001 [ 417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01 [ 417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001 [ 417.144329] head: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01 [ 417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff 00000000ffffffff [ 417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 [ 417.145485] page dumped because: kasan: bad access detected [ 417.145859] Memory state around the buggy address: [ 417.146094] ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 417.146439] ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.147145] ^ [ 417.147387] ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.147751] ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.148123] ================================================================== First of all, we have warning in get_bvec_at() because cursor->total_resid contains zero value. And, finally, we have crash in ceph_msg_data_advance() because cursor->data is NULL. It means that get_bvec_at() receives not initialized ceph_msg_data_cursor structure because data is NULL and total_resid contains zero. Moreover, we don't have likewise issue for the case of Ceph msgr1 protocol because ceph_msg_data_cursor_init() has been called before reading sparse data. This patch adds calling of ceph_msg_data_cursor_init() in the beginning of process_v2_sparse_read() with the goal to guarantee that logic of reading sparse data works correctly for the case of Ceph msgr2 protocol. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/73152 Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Nov 28, 2025
[WHAT] IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic fails with NULL pointer dereference. This can be reproduced with both an eDP panel and a DP monitors connected. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [CachyOS#1] SMP NOPTI CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted 6.16.0-99-custom torvalds#8 PREEMPT(voluntary) Hardware name: AMD ........ RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu] Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49 89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30 c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02 RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668 RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000 RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760 R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000 R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c FS: 000071f631b68700(0000) GS:ffff8b399f114000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu] amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu] ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu] amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu] drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400 drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30 drm_crtc_get_last_vbltimestamp+0x55/0x90 drm_crtc_next_vblank_start+0x45/0xa0 drm_atomic_helper_wait_for_fences+0x81/0x1f0 ... Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 621e55f) Cc: stable@vger.kernel.org
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Nov 28, 2025
A synchronous external abort occurs on the Renesas RZ/G3S SoC if unbind is
executed after the configuration sequence described above:
modprobe usb_f_ecm
modprobe libcomposite
modprobe configfs
cd /sys/kernel/config/usb_gadget
mkdir -p g1
cd g1
echo "0x1d6b" > idVendor
echo "0x0104" > idProduct
mkdir -p strings/0x409
echo "0123456789" > strings/0x409/serialnumber
echo "Renesas." > strings/0x409/manufacturer
echo "Ethernet Gadget" > strings/0x409/product
mkdir -p functions/ecm.usb0
mkdir -p configs/c.1
mkdir -p configs/c.1/strings/0x409
echo "ECM" > configs/c.1/strings/0x409/configuration
if [ ! -L configs/c.1/ecm.usb0 ]; then
ln -s functions/ecm.usb0 configs/c.1
fi
echo 11e20000.usb > UDC
echo 11e20000.usb > /sys/bus/platform/drivers/renesas_usbhs/unbind
The displayed trace is as follows:
Internal error: synchronous external abort: 0000000096000010 [CachyOS#1] SMP
CPU: 0 UID: 0 PID: 188 Comm: sh Tainted: G M 6.17.0-rc7-next-20250922-00010-g41050493b2bd torvalds#55 PREEMPT
Tainted: [M]=MACHINE_CHECK
Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs]
lr : usbhsg_update_pullup+0x3c/0x68 [renesas_usbhs]
sp : ffff8000838b3920
x29: ffff8000838b3920 x28: ffff00000d585780 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000c3e3810
x23: ffff00000d5e5c80 x22: ffff00000d5e5d40 x21: 0000000000000000
x20: 0000000000000000 x19: ffff00000d5e5c80 x18: 0000000000000020
x17: 2e30303230316531 x16: 312d7968703a7968 x15: 3d454d414e5f4344
x14: 000000000000002c x13: 0000000000000000 x12: 0000000000000000
x11: ffff00000f358f38 x10: ffff00000f358db0 x9 : ffff00000b41f418
x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
x5 : 8080808000000000 x4 : 000000004b5ccb9d x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff800083790000 x0 : ffff00000d5e5c80
Call trace:
usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs] (P)
usbhsg_pullup+0x4c/0x7c [renesas_usbhs]
usb_gadget_disconnect_locked+0x48/0xd4
gadget_unbind_driver+0x44/0x114
device_remove+0x4c/0x80
device_release_driver_internal+0x1c8/0x224
device_release_driver+0x18/0x24
bus_remove_device+0xcc/0x10c
device_del+0x14c/0x404
usb_del_gadget+0x88/0xc0
usb_del_gadget_udc+0x18/0x30
usbhs_mod_gadget_remove+0x24/0x44 [renesas_usbhs]
usbhs_mod_remove+0x20/0x30 [renesas_usbhs]
usbhs_remove+0x98/0xdc [renesas_usbhs]
platform_remove+0x20/0x30
device_remove+0x4c/0x80
device_release_driver_internal+0x1c8/0x224
device_driver_detach+0x18/0x24
unbind_store+0xb4/0xb8
drv_attr_store+0x24/0x38
sysfs_kf_write+0x7c/0x94
kernfs_fop_write_iter+0x128/0x1b8
vfs_write+0x2ac/0x350
ksys_write+0x68/0xfc
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xf0
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
Code: 7100003f 1a9f07e1 531c6c22 f9400001 (79400021)
---[ end trace 0000000000000000 ]---
note: sh[188] exited with irqs disabled
note: sh[188] exited with preempt_count 1
The issue occurs because usbhs_sys_function_pullup(), which accesses the IP
registers, is executed after the USBHS clocks have been disabled. The
problem is reproducible on the Renesas RZ/G3S SoC starting with the
addition of module stop in the clock enable/disable APIs. With module stop
functionality enabled, a bus error is expected if a master accesses a
module whose clock has been stopped and module stop activated.
Disable the IP clocks at the end of remove.
Cc: stable <stable@kernel.org>
Fixes: f1407d5 ("usb: renesas_usbhs: Add Renesas USBHS common code")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251027140741.557198-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 2, 2025
The CMT device can be used as both a clocksource and a clockevent
provider. The driver tries to be smart and power itself on and off, as
well as enabling and disabling its clock when it's not in operation.
This behavior is slightly altered if the CMT is used as an early
platform device in which case the device is left powered on after probe,
but the clock is still enabled and disabled at runtime.
This has worked for a long time, but recent improvements in PREEMPT_RT
and PROVE_LOCKING have highlighted an issue. As the CMT registers itself
as a clockevent provider, clockevents_register_device(), it needs to use
raw spinlocks internally as this is the context of which the clockevent
framework interacts with the CMT driver. However in the context of
holding a raw spinlock the CMT driver can't really manage its power
state or clock with calls to pm_runtime_*() and clk_*() as these calls
end up in other platform drivers using regular spinlocks to control
power and clocks.
This mix of spinlock contexts trips a lockdep warning.
=============================
[ BUG: Invalid wait context ]
6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty torvalds#21 Not tainted
-----------------------------
swapper/1/0 is trying to lock:
ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88
ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0
other info that might help us debug this:
ccree e6601000.crypto: ARM ccree device initialized
context-{5:5}
2 locks held by swapper/1/0:
#0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8
CachyOS#1: ffff0000089a5858 (&ch->lock){....}-{2:2}
usbcore: registered new interface driver usbhid
, at: sh_cmt_start+0x30/0x364
stack backtrace:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty torvalds#21 PREEMPT
Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
Call trace:
show_stack+0x14/0x1c (C)
dump_stack_lvl+0x6c/0x90
dump_stack+0x14/0x1c
__lock_acquire+0x904/0x1584
lock_acquire+0x220/0x34c
_raw_spin_lock_irqsave+0x58/0x80
__pm_runtime_resume+0x38/0x88
sh_cmt_start+0x54/0x364
sh_cmt_clock_event_set_oneshot+0x64/0xb8
clockevents_switch_state+0xfc/0x13c
tick_broadcast_set_event+0x30/0xa4
__tick_broadcast_oneshot_control+0x1e0/0x3a8
tick_broadcast_oneshot_control+0x30/0x40
cpuidle_enter_state+0x40c/0x680
cpuidle_enter+0x30/0x40
do_idle+0x1f4/0x26c
cpu_startup_entry+0x34/0x40
secondary_start_kernel+0x11c/0x13c
__secondary_switched+0x74/0x78
For non-PREEMPT_RT builds this is not really an issue, but for
PREEMPT_RT builds where normal spinlocks can sleep this might be an
issue. Be cautious and always leave the power and clock running after
probe.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 5, 2025
…ockdep While developing IPPROTO_SMBDIRECT support for the code under fs/smb/common/smbdirect [1], I noticed false positives like this: [T79] ====================================================== [T79] WARNING: possible circular locking dependency detected [T79] 6.18.0-rc4-metze-kasan-lockdep.01+ CachyOS#1 Tainted: G OE [T79] ------------------------------------------------------ [T79] kworker/2:0/79 is trying to acquire lock: [T79] ffff88801f968278 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_set_reuseaddr+0x14/0x70 [T79] but task is already holding lock: [T79] ffffffffc10f7230 (lock#9){+.+.}-{4:4}, at: rdma_listen+0x3d2/0x740 [rdma_cm] [T79] which lock already depends on the new lock. [T79] the existing dependency chain (in reverse order) is: [T79] -> CachyOS#1 (lock#9){+.+.}-{4:4}: [T79] __lock_acquire+0x535/0xc30 [T79] lock_acquire.part.0+0xb3/0x240 [T79] lock_acquire+0x60/0x140 [T79] __mutex_lock+0x1af/0x1c10 [T79] mutex_lock_nested+0x1b/0x30 [T79] cma_get_port+0xba/0x7d0 [rdma_cm] [T79] rdma_bind_addr_dst+0x598/0x9a0 [rdma_cm] [T79] cma_bind_addr+0x107/0x320 [rdma_cm] [T79] rdma_resolve_addr+0xa3/0x830 [rdma_cm] [T79] destroy_lease_table+0x12b/0x420 [ksmbd] [T79] ksmbd_NTtimeToUnix+0x3e/0x80 [ksmbd] [T79] ndr_encode_posix_acl+0x6e9/0xab0 [ksmbd] [T79] ndr_encode_v4_ntacl+0x53/0x870 [ksmbd] [T79] __sys_connect_file+0x131/0x1c0 [T79] __sys_connect+0x111/0x140 [T79] __x64_sys_connect+0x72/0xc0 [T79] x64_sys_call+0xe7d/0x26a0 [T79] do_syscall_64+0x93/0xff0 [T79] entry_SYSCALL_64_after_hwframe+0x76/0x7e [T79] -> #0 (sk_lock-AF_INET){+.+.}-{0:0}: [T79] check_prev_add+0xf3/0xcd0 [T79] validate_chain+0x466/0x590 [T79] __lock_acquire+0x535/0xc30 [T79] lock_acquire.part.0+0xb3/0x240 [T79] lock_acquire+0x60/0x140 [T79] lock_sock_nested+0x3b/0xf0 [T79] sock_set_reuseaddr+0x14/0x70 [T79] siw_create_listen+0x145/0x1540 [siw] [T79] iw_cm_listen+0x313/0x5b0 [iw_cm] [T79] cma_iw_listen+0x271/0x3c0 [rdma_cm] [T79] rdma_listen+0x3b1/0x740 [rdma_cm] [T79] cma_listen_on_dev+0x46a/0x750 [rdma_cm] [T79] rdma_listen+0x4b0/0x740 [rdma_cm] [T79] ksmbd_rdma_init+0x12b/0x270 [ksmbd] [T79] ksmbd_conn_transport_init+0x26/0x70 [ksmbd] [T79] server_ctrl_handle_work+0x1e5/0x280 [ksmbd] [T79] process_one_work+0x86c/0x1930 [T79] worker_thread+0x6f0/0x11f0 [T79] kthread+0x3ec/0x8b0 [T79] ret_from_fork+0x314/0x400 [T79] ret_from_fork_asm+0x1a/0x30 [T79] other info that might help us debug this: [T79] Possible unsafe locking scenario: [T79] CPU0 CPU1 [T79] ---- ---- [T79] lock(lock#9); [T79] lock(sk_lock-AF_INET); [T79] lock(lock#9); [T79] lock(sk_lock-AF_INET); [T79] *** DEADLOCK *** [T79] 5 locks held by kworker/2:0/79: [T79] #0: ffff88800120b158 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0xfca/0x1930 [T79] CachyOS#1: ffffc9000474fd00 ((work_completion)(&ctrl->ctrl_work)) {+.+.}-{0:0}, at: process_one_work+0x804/0x1930 [T79] CachyOS#2: ffffffffc11307d0 (ctrl_lock){+.+.}-{4:4}, at: server_ctrl_handle_work+0x21/0x280 [ksmbd] [T79] CachyOS#3: ffffffffc11347b0 (init_lock){+.+.}-{4:4}, at: ksmbd_conn_transport_init+0x18/0x70 [ksmbd] [T79] CachyOS#4: ffffffffc10f7230 (lock#9){+.+.}-{4:4}, at: rdma_listen+0x3d2/0x740 [rdma_cm] [T79] stack backtrace: [T79] CPU: 2 UID: 0 PID: 79 Comm: kworker/2:0 Kdump: loaded Tainted: G OE 6.18.0-rc4-metze-kasan-lockdep.01+ CachyOS#1 PREEMPT(voluntary) [T79] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [T79] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [T79] Workqueue: events_long server_ctrl_handle_work [ksmbd] ... [T79] print_circular_bug+0xfd/0x130 [T79] check_noncircular+0x150/0x170 [T79] check_prev_add+0xf3/0xcd0 [T79] validate_chain+0x466/0x590 [T79] __lock_acquire+0x535/0xc30 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] lock_acquire.part.0+0xb3/0x240 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? __kasan_check_write+0x14/0x30 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? apparmor_socket_post_create+0x180/0x700 [T79] lock_acquire+0x60/0x140 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] lock_sock_nested+0x3b/0xf0 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] sock_set_reuseaddr+0x14/0x70 [T79] siw_create_listen+0x145/0x1540 [siw] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? local_clock_noinstr+0xe/0xd0 [T79] ? __pfx_siw_create_listen+0x10/0x10 [siw] [T79] ? trace_preempt_on+0x4c/0x130 [T79] ? __raw_spin_unlock_irqrestore+0x4a/0x90 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? preempt_count_sub+0x52/0x80 [T79] iw_cm_listen+0x313/0x5b0 [iw_cm] [T79] cma_iw_listen+0x271/0x3c0 [rdma_cm] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] rdma_listen+0x3b1/0x740 [rdma_cm] [T79] ? _raw_spin_unlock+0x2c/0x60 [T79] ? __pfx_rdma_listen+0x10/0x10 [rdma_cm] [T79] ? rdma_restrack_add+0x12c/0x630 [ib_core] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] cma_listen_on_dev+0x46a/0x750 [rdma_cm] [T79] rdma_listen+0x4b0/0x740 [rdma_cm] [T79] ? __pfx_rdma_listen+0x10/0x10 [rdma_cm] [T79] ? cma_get_port+0x30d/0x7d0 [rdma_cm] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? rdma_bind_addr_dst+0x598/0x9a0 [rdma_cm] [T79] ksmbd_rdma_init+0x12b/0x270 [ksmbd] [T79] ? __pfx_ksmbd_rdma_init+0x10/0x10 [ksmbd] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? register_netdevice_notifier+0x1dc/0x240 [T79] ksmbd_conn_transport_init+0x26/0x70 [ksmbd] [T79] server_ctrl_handle_work+0x1e5/0x280 [ksmbd] [T79] process_one_work+0x86c/0x1930 [T79] ? __pfx_process_one_work+0x10/0x10 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? assign_work+0x16f/0x280 [T79] worker_thread+0x6f0/0x11f0 I was not able to reproduce this as I was testing with various runs switching siw and rxe as well as IPPROTO_SMBDIRECT sockets, while the above stack used siw with the non IPPROTO_SMBDIRECT patches [1]. Even if this patch doesn't solve the above I think it's a good idea to reclassify the sockets used by siw, I also send patches for rxe to reclassify, as well as my IPPROTO_SMBDIRECT socket patches [1] will do it, this should minimize potential false positives. [1] https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/master-ipproto-smbdirect Cc: Bernard Metzler <bernard.metzler@linux.dev> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20251126150842.1837072-1-metze@samba.org Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 5, 2025
…ockdep While developing IPPROTO_SMBDIRECT support for the code under fs/smb/common/smbdirect [1], I noticed false positives like this: [+0,003927] ============================================ [+0,000532] WARNING: possible recursive locking detected [+0,000611] 6.18.0-rc5-metze-kasan-lockdep.02+ CachyOS#1 Tainted: G OE [+0,000835] -------------------------------------------- [+0,000729] ksmbd:r5445/3609 is trying to acquire lock: [+0,000709] ffff88800b9570f8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x52/0x360 [+0,000831] but task is already holding lock: [+0,000684] ffff88800654af78 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: smbdirect_sk_close+0x122/0x790 [smbdirect] [+0,000928] other info that might help us debug this: [+0,005552] Possible unsafe locking scenario: [+0,000723] CPU0 [+0,000359] ---- [+0,000377] lock(k-sk_lock-AF_INET); [+0,000478] lock(k-sk_lock-AF_INET); [+0,000498] *** DEADLOCK *** [+0,001012] May be due to missing lock nesting notation [+0,000831] 3 locks held by ksmbd:r5445/3609: [+0,000484] #0: ffff88800654af78 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: smbdirect_sk_close+0x122/0x790 [smbdirect] [+0,001000] CachyOS#1: ffff888020a40458 (&id_priv->handler_mutex){+.+.}-{4:4}, at: rdma_lock_handler+0x17/0x30 [rdma_cm] [+0,000982] CachyOS#2: ffff888020a40350 (&id_priv->qp_mutex){+.+.}-{4:4}, at: rdma_destroy_qp+0x5d/0x1f0 [rdma_cm] [+0,000934] stack backtrace: [+0,000589] CPU: 0 UID: 0 PID: 3609 Comm: ksmbd:r5445 Kdump: loaded Tainted: G OE 6.18.0-rc5-metze-kasan-lockdep.02+ CachyOS#1 PREEMPT(voluntary) [+0,000023] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [+0,000004] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 ... [+0,000010] print_deadlock_bug+0x245/0x330 [+0,000014] validate_chain+0x32a/0x590 [+0,000012] __lock_acquire+0x535/0xc30 [+0,000013] lock_acquire.part.0+0xb3/0x240 [+0,000017] ? inet_shutdown+0x52/0x360 [+0,000013] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000007] ? mark_held_locks+0x46/0x90 [+0,000012] lock_acquire+0x60/0x140 [+0,000006] ? inet_shutdown+0x52/0x360 [+0,000028] lock_sock_nested+0x3b/0xf0 [+0,000009] ? inet_shutdown+0x52/0x360 [+0,000008] inet_shutdown+0x52/0x360 [+0,000010] kernel_sock_shutdown+0x5b/0x90 [+0,000011] rxe_qp_do_cleanup+0x4ef/0x810 [rdma_rxe] [+0,000043] ? __pfx_rxe_qp_do_cleanup+0x10/0x10 [rdma_rxe] [+0,000030] execute_in_process_context+0x2b/0x170 [+0,000013] rxe_qp_cleanup+0x1c/0x30 [rdma_rxe] [+0,000021] __rxe_cleanup+0x1cf/0x2e0 [rdma_rxe] [+0,000036] ? __pfx___rxe_cleanup+0x10/0x10 [rdma_rxe] [+0,000020] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000006] ? __kasan_check_read+0x11/0x20 [+0,000012] rxe_destroy_qp+0xe1/0x230 [rdma_rxe] [+0,000035] ib_destroy_qp_user+0x217/0x450 [ib_core] [+0,000074] rdma_destroy_qp+0x83/0x1f0 [rdma_cm] [+0,000034] smbdirect_connection_destroy_qp+0x98/0x2e0 [smbdirect] [+0,000017] ? __pfx_smb_direct_logging_needed+0x10/0x10 [ksmbd] [+0,000044] smbdirect_connection_destroy+0x698/0xed0 [smbdirect] [+0,000023] ? __pfx_smbdirect_connection_destroy+0x10/0x10 [smbdirect] [+0,000033] ? __pfx_smb_direct_logging_needed+0x10/0x10 [ksmbd] [+0,000031] smbdirect_connection_destroy_sync+0x42b/0x9f0 [smbdirect] [+0,000029] ? mark_held_locks+0x46/0x90 [+0,000012] ? __pfx_smbdirect_connection_destroy_sync+0x10/0x10 [smbdirect] [+0,000019] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000007] ? trace_hardirqs_on+0x64/0x70 [+0,000029] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000010] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000006] ? __smbdirect_connection_schedule_disconnect+0x339/0x4b0 [+0,000021] smbdirect_sk_destroy+0xb0/0x680 [smbdirect] [+0,000024] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000006] ? trace_hardirqs_on+0x64/0x70 [+0,000006] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000005] ? __local_bh_enable_ip+0xba/0x150 [+0,000011] sk_common_release+0x66/0x340 [+0,000010] smbdirect_sk_close+0x12a/0x790 [smbdirect] [+0,000023] ? ip_mc_drop_socket+0x1e/0x240 [+0,000013] inet_release+0x10a/0x240 [+0,000011] smbdirect_sock_release+0x502/0xe80 [smbdirect] [+0,000015] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000024] sock_release+0x91/0x1c0 [+0,000010] smb_direct_free_transport+0x31/0x50 [ksmbd] [+0,000025] ksmbd_conn_free+0x1d0/0x240 [ksmbd] [+0,000040] smb_direct_disconnect+0xb2/0x120 [ksmbd] [+0,000023] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000018] ksmbd_conn_handler_loop+0x94e/0xf10 [ksmbd] ... I'll also add reclassify to the smbdirect socket code [1], but I think it's better to have it in both direction (below and above the RDMA layer). [1] https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/master-ipproto-smbdirect Cc: Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20251127105614.2040922-1-metze@samba.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 5, 2025
rcu_read_lock() is not needed in fprobe_entry, but rcu_dereference_check() is used in rhltable_lookup(), which causes suspicious RCU usage warning: WARNING: suspicious RCU usage 6.17.0-rc1-00001-gdfe0d675df82 CachyOS#1 Tainted: G S ----------------------------- include/linux/rhashtable.h:602 suspicious rcu_dereference_check() usage! ...... stack backtrace: CPU: 1 UID: 0 PID: 4652 Comm: ftracetest Tainted: G S Tainted: [S]=CPU_OUT_OF_SPEC, [I]=FIRMWARE_WORKAROUND Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015 Call Trace: <TASK> dump_stack_lvl+0x7c/0x90 lockdep_rcu_suspicious+0x14f/0x1c0 __rhashtable_lookup+0x1e0/0x260 ? __pfx_kernel_clone+0x10/0x10 fprobe_entry+0x9a/0x450 ? __lock_acquire+0x6b0/0xca0 ? find_held_lock+0x2b/0x80 ? __pfx_fprobe_entry+0x10/0x10 ? __pfx_kernel_clone+0x10/0x10 ? lock_acquire+0x14c/0x2d0 ? __might_fault+0x74/0xc0 function_graph_enter_regs+0x2a0/0x550 ? __do_sys_clone+0xb5/0x100 ? __pfx_function_graph_enter_regs+0x10/0x10 ? _copy_to_user+0x58/0x70 ? __pfx_kernel_clone+0x10/0x10 ? __x64_sys_rt_sigprocmask+0x114/0x180 ? __pfx___x64_sys_rt_sigprocmask+0x10/0x10 ? __pfx_kernel_clone+0x10/0x10 ftrace_graph_func+0x87/0xb0 As we discussed in [1], fix this by using guard(rcu)() in fprobe_entry() to protect the rhltable_lookup() and rhl_for_each_entry_rcu() with rcu_read_lock and suppress this warning. Link: https://lore.kernel.org/all/20250904062729.151931-1-dongml2@chinatelecom.cn/ Link: https://lore.kernel.org/all/20250829021436.19982-1-dongml2@chinatelecom.cn/ [1] Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508281655.54c87330-lkp@intel.com Fixes: dfe0d67 ("tracing: fprobe: use rhltable for fprobe_ip_table") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 6, 2025
Bart Van Assche <bvanassche@acm.org> says: Hi Martin, This patch series includes two bug fixes for this development cycle and six small patches that are intended for the next merge window. If applying the first two patches only during the current development cycle would be inconvenient, postponing all patches until the next merge window is fine with me. Please consider including these patches in the upstream kernel. Thanks, Bart. [mkp: Applied patches CachyOS#1 and CachyOS#2 to 6.18/scsi-fixes] Link: https://patch.msgid.link/20251014200118.3390839-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 6, 2025
…bort path" This reverts commit 0367076. The commit being reverted added code to __qla2x00_abort_all_cmds() to call sp->done() without holding a spinlock. But unlike the older code below it, this new code failed to check sp->cmd_type and just assumed TYPE_SRB, which results in a jump to an invalid pointer in target-mode with TYPE_TGT_CMD: qla2xxx [0000:65:00.0]-d034:8: qla24xx_do_nack_work create sess success 0000000009f7a79b qla2xxx [0000:65:00.0]-5003:8: ISP System Error - mbx1=1ff5h mbx2=10h mbx3=0h mbx4=0h mbx5=191h mbx6=0h mbx7=0h. qla2xxx [0000:65:00.0]-d01e:8: -> fwdump no buffer qla2xxx [0000:65:00.0]-f03a:8: qla_target(0): System error async event 0x8002 occurred qla2xxx [0000:65:00.0]-00af:8: Performing ISP error recovery - ha=0000000058183fda. BUG: kernel NULL pointer dereference, address: 0000000000000000 PF: supervisor instruction fetch in kernel mode PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: 0010 [CachyOS#1] SMP CPU: 2 PID: 9446 Comm: qla2xxx_8_dpc Tainted: G O 6.1.133 CachyOS#1 Hardware name: Supermicro Super Server/X11SPL-F, BIOS 4.2 12/15/2023 RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffc90001f93dc8 EFLAGS: 00010206 RAX: 0000000000000282 RBX: 0000000000000355 RCX: ffff88810d16a000 RDX: ffff88810dbadaa8 RSI: 0000000000080000 RDI: ffff888169dc38c0 RBP: ffff888169dc38c0 R08: 0000000000000001 R09: 0000000000000045 R10: ffffffffa034bdf0 R11: 0000000000000000 R12: ffff88810800bb40 R13: 0000000000001aa8 R14: ffff888100136610 R15: ffff8881070f7400 FS: 0000000000000000(0000) GS:ffff88bf80080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000010c8ff006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x4d/0x8b ? page_fault_oops+0x91/0x180 ? trace_buffer_unlock_commit_regs+0x38/0x1a0 ? exc_page_fault+0x391/0x5e0 ? asm_exc_page_fault+0x22/0x30 __qla2x00_abort_all_cmds+0xcb/0x3e0 [qla2xxx_scst] qla2x00_abort_all_cmds+0x50/0x70 [qla2xxx_scst] qla2x00_abort_isp_cleanup+0x3b7/0x4b0 [qla2xxx_scst] qla2x00_abort_isp+0xfd/0x860 [qla2xxx_scst] qla2x00_do_dpc+0x581/0xa40 [qla2xxx_scst] kthread+0xa8/0xd0 </TASK> Then commit 4475afa ("scsi: qla2xxx: Complete command early within lock") added the spinlock back, because not having the lock caused a race and a crash. But qla2x00_abort_srb() in the switch below already checks for qla2x00_chip_is_down() and handles it the same way, so the code above the switch is now redundant and still buggy in target-mode. Remove it. Cc: stable@vger.kernel.org Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Link: https://patch.msgid.link/3a8022dc-bcfd-4b01-9f9b-7a9ec61fa2a3@cybernetics.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
With the generic crashkernel reservation, the kernel emits the following warning on powerpc: WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/mem.c:341 add_system_ram_resources+0xfc/0x180 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-auto-12607-g5472d60c129f CachyOS#1 VOLUNTARY Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries NIP: c00000000201de3c LR: c00000000201de34 CTR: 0000000000000000 REGS: c000000127cef8a0 TRAP: 0700 Not tainted (6.17.0-auto-12607-g5472d60c129f) MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84000840 XER: 20040010 CFAR: c00000000017eed0 IRQMASK: 0 GPR00: c00000000201de34 c000000127cefb40 c0000000016a8100 0000000000000001 GPR04: c00000012005aa00 0000000020000000 c000000002b705c8 0000000000000000 GPR08: 000000007fffffff fffffffffffffff0 c000000002db8100 000000011fffffff GPR12: c00000000201dd40 c000000002ff0000 c0000000000112bc 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000015a3808 GPR24: c00000000200468c c000000001699888 0000000000000106 c0000000020d1950 GPR28: c0000000014683f8 0000000081000200 c0000000015c1868 c000000002b9f710 NIP [c00000000201de3c] add_system_ram_resources+0xfc/0x180 LR [c00000000201de34] add_system_ram_resources+0xf4/0x180 Call Trace: add_system_ram_resources+0xf4/0x180 (unreliable) do_one_initcall+0x60/0x36c do_initcalls+0x120/0x220 kernel_init_freeable+0x23c/0x390 kernel_init+0x34/0x26c ret_from_kernel_user_thread+0x14/0x1c This warning occurs due to a conflict between crashkernel and System RAM iomem resources. The generic crashkernel reservation adds the crashkernel memory range to /proc/iomem during early initialization. Later, all memblock ranges are added to /proc/iomem as System RAM. If the crashkernel region overlaps with any memblock range, it causes a conflict while adding those memblock regions as iomem resources, triggering the above warning. The conflicting memblock regions are then omitted from /proc/iomem. For example, if the following crashkernel region is added to /proc/iomem: 20000000-11fffffff : Crash kernel then the following memblock regions System RAM regions fail to be inserted: 00000000-7fffffff : System RAM 80000000-257fffffff : System RAM Fix this by not adding the crashkernel memory to /proc/iomem on powerpc. Introduce an architecture hook to let each architecture decide whether to export the crashkernel region to /proc/iomem. For more info checkout commit c40dd2f ("powerpc: Add System RAM to /proc/iomem") and commit bce074b ("powerpc: insert System RAM resource to prevent crashkernel conflict") Note: Before switching to the generic crashkernel reservation, powerpc never exported the crashkernel region to /proc/iomem. Link: https://lkml.kernel.org/r/20251016142831.144515-1-sourabhjain@linux.ibm.com Fixes: e3185ee ("powerpc/crash: use generic crashkernel reservation"). Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/90937fe0-2e76-4c82-b27e-7b8a7fe3ac69@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
The chain allocator field cl_bpc (blocks per cluster) is read from disk and used in division operations without validation. A corrupted filesystem image with cl_bpc=0 causes a divide-by-zero crash in the kernel: divide error: 0000 [CachyOS#1] PREEMPT SMP KASAN RIP: 0010:ocfs2_bg_discontig_add_extent fs/ocfs2/suballoc.c:335 [inline] RIP: 0010:ocfs2_block_group_fill+0x5bd/0xa70 fs/ocfs2/suballoc.c:386 Call Trace: ocfs2_block_group_alloc+0x7e9/0x1330 fs/ocfs2/suballoc.c:703 ocfs2_reserve_suballoc_bits+0x20a6/0x4640 fs/ocfs2/suballoc.c:834 ocfs2_reserve_new_inode+0x4f4/0xcc0 fs/ocfs2/suballoc.c:1074 ocfs2_mknod+0x83c/0x2050 fs/ocfs2/namei.c:306 This patch adds validation in ocfs2_validate_inode_block() to ensure cl_bpc matches the expected value calculated from the superblock's cluster size and block size for chain allocator inodes (identified by OCFS2_CHAIN_FL). Moving the validation to inode validation time (rather than allocation time) has several benefits: - Validates once when the inode is read, rather than on every allocation - Protects all code paths that use cl_bpc (allocation, resize, etc.) - Follows the existing pattern of inode validation in OCFS2 - Centralizes validation logic The validation catches both: - Zero values that cause divide-by-zero crashes - Non-zero but incorrect values indicating filesystem corruption or mismatched filesystem geometry With this fix, mounting a corrupted filesystem produces: OCFS2: ERROR (device loop0): ocfs2_validate_inode_block: Inode 74 has corrupted cl_bpc: ondisk=0 expected=16 instead of a kernel crash. [dmantipov@yandex.ru: combine into the series and tweak the message to fit the commonly used style] Link: https://lkml.kernel.org/r/20251030153003.1934585-2-dmantipov@yandex.ru Link: https://lore.kernel.org/ocfs2-devel/20251026132625.12348-1-kartikey406@gmail.com/T/#u [v1] Link: https://lore.kernel.org/all/20251027124131.10002-1-kartikey406@gmail.com/T/ [v2] Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+fd8af97c7227fe605d95@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fd8af97c7227fe605d95 Tested-by: syzbot+fd8af97c7227fe605d95@syzkaller.appspotmail.com Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
Rust Binder contains the following unsafe operation:
// SAFETY: A `NodeDeath` is never inserted into the death list
// of any node other than its owner, so it is either in this
// death list or in no death list.
unsafe { node_inner.death_list.remove(self) };
This operation is unsafe because when touching the prev/next pointers of
a list element, we have to ensure that no other thread is also touching
them in parallel. If the node is present in the list that `remove` is
called on, then that is fine because we have exclusive access to that
list. If the node is not in any list, then it's also ok. But if it's
present in a different list that may be accessed in parallel, then that
may be a data race on the prev/next pointers.
And unfortunately that is exactly what is happening here. In
Node::release, we:
1. Take the lock.
2. Move all items to a local list on the stack.
3. Drop the lock.
4. Iterate the local list on the stack.
Combined with threads using the unsafe remove method on the original
list, this leads to memory corruption of the prev/next pointers. This
leads to crashes like this one:
Unable to handle kernel paging request at virtual address 000bb9841bcac70e
Mem abort info:
ESR = 0x0000000096000044
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
CM = 0, WnR = 1, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[000bb9841bcac70e] address between user and kernel address ranges
Internal error: Oops: 0000000096000044 [CachyOS#1] PREEMPT SMP
google-cdd 538c004.gcdd: context saved(CPU:1)
item - log_kevents is disabled
Modules linked in: ... rust_binder
CPU: 1 UID: 0 PID: 2092 Comm: kworker/1:178 Tainted: G S W OE 6.12.52-android16-5-g98debd5df505-4k CachyOS#1 f94a6367396c5488d635708e43ee0c888d230b0b
Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: MUSTANG PVT 1.0 based on LGA (DT)
Workqueue: events _RNvXs6_NtCsdfZWD8DztAw_6kernel9workqueueINtNtNtB7_4sync3arc3ArcNtNtCs8QPsHWIn21X_16rust_binder_main7process7ProcessEINtB5_15WorkItemPointerKy0_E3runB13_ [rust_binder]
pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder]
lr : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x464/0x11f8 [rust_binder]
sp : ffffffc09b433ac0
x29: ffffffc09b433d30 x28: ffffff8821690000 x27: ffffffd40cbaa448
x26: ffffff8821690000 x25: 00000000ffffffff x24: ffffff88d0376578
x23: 0000000000000001 x22: ffffffc09b433c78 x21: ffffff88e8f9bf40
x20: ffffff88e8f9bf40 x19: ffffff882692b000 x18: ffffffd40f10bf00
x17: 00000000c006287d x16: 00000000c006287d x15: 00000000000003b0
x14: 0000000000000100 x13: 000000201cb79ae0 x12: fffffffffffffff0
x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000000
x8 : b80bb9841bcac706 x7 : 0000000000000001 x6 : fffffffebee63f30
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000004c31 x1 : ffffff88216900c0 x0 : ffffff88e8f9bf00
Call trace:
_RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder bbc172b53665bbc815363b22e97e3f7e3fe971fc]
process_scheduled_works+0x1c4/0x45c
worker_thread+0x32c/0x3e8
kthread+0x11c/0x1c8
ret_from_fork+0x10/0x20
Code: 94218d85 b4000155 a94026a8 d10102a0 (f9000509)
---[ end trace 0000000000000000 ]---
Thus, modify Node::release to pop items directly off the original list.
Cc: stable@vger.kernel.org
Fixes: eafedbc ("rust_binder: add Rust Binder driver")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://patch.msgid.link/20251111-binder-fix-list-remove-v1-1-8ed14a0da63d@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
Syzbot identified an issue [1] in pcl818_ai_cancel(), which stems from the fact that in case of early device detach via pcl818_detach(), subdevice dev->read_subdev may not have initialized its pointer to &struct comedi_async as intended. Thus, any such dereferencing of &s->async->cmd will lead to general protection fault and kernel crash. Mitigate this problem by removing a call to pcl818_ai_cancel() from pcl818_detach() altogether. This way, if the subdevice setups its support for async commands, everything async-related will be handled via subdevice's own ->cancel() function in comedi_device_detach_locked() even before pcl818_detach(). If no support for asynchronous commands is provided, there is no need to cancel anything either. [1] Syzbot crash: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [CachyOS#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 1 UID: 0 PID: 6050 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:pcl818_ai_cancel+0x69/0x3f0 drivers/comedi/drivers/pcl818.c:762 ... Call Trace: <TASK> pcl818_detach+0x66/0xd0 drivers/comedi/drivers/pcl818.c:1115 comedi_device_detach_locked+0x178/0x750 drivers/comedi/drivers.c:207 do_devconfig_ioctl drivers/comedi/comedi_fops.c:848 [inline] comedi_unlocked_ioctl+0xcde/0x1020 drivers/comedi/comedi_fops.c:2178 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] ... Reported-by: syzbot+fce5d9d5bd067d6fbe9b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fce5d9d5bd067d6fbe9b Fixes: 00aba6e ("staging: comedi: pcl818: remove 'neverending_ai' from private data") Cc: stable <stable@kernel.org> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Link: https://patch.msgid.link/20251023141457.398685-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
… 'T'
When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.
The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.
Stack trace as below:
Perf: Segmentation fault
-------- backtrace --------
#0 0x55d365 in ui__signal_backtrace setup.c:0
CachyOS#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
CachyOS#2 0x570f08 in arch__is perf[570f08]
CachyOS#3 0x562186 in annotate_get_insn_location perf[562186]
CachyOS#4 0x562626 in __hist_entry__get_data_type annotate.c:0
CachyOS#5 0x56476d in annotation_line__write perf[56476d]
CachyOS#6 0x54e2db in annotate_browser__write annotate.c:0
torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
torvalds#12 0x552293 in do_annotate hists.c:0
torvalds#13 0x55941c in evsel__hists_browse hists.c:0
torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
torvalds#15 0x42ff02 in cmd_report perf[42ff02]
torvalds#16 0x494008 in run_builtin perf.c:0
torvalds#17 0x494305 in handle_internal_command perf.c:0
torvalds#18 0x410547 in main perf[410547]
torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
torvalds#21 0x410b75 in _start perf[410b75]
Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:
perf record -e cycles-ct -F 1000 -a --overwrite
Error:
cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
perf: Segmentation fault
#0 0x6466b6 in dump_stack debug.c:366
CachyOS#1 0x646729 in sighandler_dump_stack debug.c:378
CachyOS#2 0x453fd1 in sigsegv_handler builtin-record.c:722
CachyOS#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
CachyOS#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
CachyOS#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
CachyOS#6 0x458090 in record__synthesize builtin-record.c:2075
torvalds#7 0x45a85a in __cmd_record builtin-record.c:2888
torvalds#8 0x45deb6 in cmd_record builtin-record.c:4374
torvalds#9 0x4e5e33 in run_builtin perf.c:349
torvalds#10 0x4e60bf in handle_internal_command perf.c:401
torvalds#11 0x4e6215 in run_argv perf.c:448
torvalds#12 0x4e653a in main perf.c:555
torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
torvalds#14 0x43a3ee in _start ??:0
The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.
To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.
Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
When interrupting perf stat in repeat mode with a signal the signal is passed to the child process but the repeat doesn't terminate: ``` $ perf stat -v --null --repeat 10 sleep 1 Control descriptor is not initialized [ perf stat: executing run CachyOS#1 ... ] [ perf stat: executing run CachyOS#2 ... ] ^Csleep: Interrupt [ perf stat: executing run CachyOS#3 ... ] [ perf stat: executing run CachyOS#4 ... ] [ perf stat: executing run CachyOS#5 ... ] [ perf stat: executing run CachyOS#6 ... ] [ perf stat: executing run torvalds#7 ... ] [ perf stat: executing run torvalds#8 ... ] [ perf stat: executing run torvalds#9 ... ] [ perf stat: executing run torvalds#10 ... ] Performance counter stats for 'sleep 1' (10 runs): 0.9500 +- 0.0512 seconds time elapsed ( +- 5.39% ) 0.01user 0.02system 0:09.53elapsed 0%CPU (0avgtext+0avgdata 18940maxresident)k 29944inputs+0outputs (0major+2629minor)pagefaults 0swaps ``` Terminate the repeated run and give a reasonable exit value: ``` $ perf stat -v --null --repeat 10 sleep 1 Control descriptor is not initialized [ perf stat: executing run CachyOS#1 ... ] [ perf stat: executing run CachyOS#2 ... ] [ perf stat: executing run CachyOS#3 ... ] ^Csleep: Interrupt Performance counter stats for 'sleep 1' (10 runs): 0.680 +- 0.321 seconds time elapsed ( +- 47.16% ) Command exited with non-zero status 130 0.00user 0.01system 0:02.05elapsed 0%CPU (0avgtext+0avgdata 70688maxresident)k 0inputs+0outputs (0major+5002minor)pagefaults 0swaps ``` Note, this also changes the exit value for non-repeat runs when interrupted by a signal. Reported-by: Ingo Molnar <mingo@kernel.org> Closes: https://lore.kernel.org/lkml/aS5wjmbAM9ka3M2g@gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
When booting with KASAN enabled the following splat is encountered during probe of the k1 clock driver: UBSAN: array-index-out-of-bounds in drivers/clk/spacemit/ccu-k1.c:1044:16 index 0 is out of range for type 'clk_hw *[*]' CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc5+ CachyOS#1 PREEMPT(lazy) Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.10spacemit 10/01/2022 Call Trace: [<ffffffff8002b628>] dump_backtrace+0x28/0x38 [<ffffffff800027d2>] show_stack+0x3a/0x50 [<ffffffff800220c2>] dump_stack_lvl+0x5a/0x80 [<ffffffff80022100>] dump_stack+0x18/0x20 [<ffffffff800164b8>] ubsan_epilogue+0x10/0x48 [<ffffffff8099034e>] __ubsan_handle_out_of_bounds+0xa6/0xa8 [<ffffffff80acbfa6>] k1_ccu_probe+0x37e/0x420 [<ffffffff80b79e6e>] platform_probe+0x56/0x98 [<ffffffff80b76a7e>] really_probe+0x9e/0x350 [<ffffffff80b76db0>] __driver_probe_device+0x80/0x138 [<ffffffff80b76f52>] driver_probe_device+0x3a/0xd0 [<ffffffff80b771c4>] __driver_attach+0xac/0x1b8 [<ffffffff80b742fc>] bus_for_each_dev+0x6c/0xc8 [<ffffffff80b76296>] driver_attach+0x26/0x38 [<ffffffff80b759ae>] bus_add_driver+0x13e/0x268 [<ffffffff80b7836a>] driver_register+0x52/0x100 [<ffffffff80b79a78>] __platform_driver_register+0x28/0x38 [<ffffffff814585da>] k1_ccu_driver_init+0x22/0x38 [<ffffffff80023a8a>] do_one_initcall+0x62/0x2a0 [<ffffffff81401c60>] do_initcalls+0x170/0x1a8 [<ffffffff81401e7a>] kernel_init_freeable+0x16a/0x1e0 [<ffffffff811f7534>] kernel_init+0x2c/0x180 [<ffffffff80025f56>] ret_from_fork_kernel+0x16/0x1d8 [<ffffffff81205336>] ret_from_fork_kernel_asm+0x16/0x18 ---[ end trace ]--- This is bogus and is simply a result of KASAN consulting the `.num` member of the struct for bounds information (as it should due to `__counted_by`) and finding 0 set by kzalloc() because it has not been initialized before the loop that fills in the array. The easy fix is to just move the line that sets `num` to before the loop that fills the array so that KASAN has the information it needs to accurately conclude that the access is valid. Fixes: 1b72c59 ("clk: spacemit: Add clock support for SpacemiT K1 SoC") Tested-by: Yanko Kaneti <yaneti@declera.com> Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Reviewed-by: Alex Elder <elder@riscstar.com> Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Reviewed-by: Yixun Lan <dlan@gentoo.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
Commit 1d2da79 ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in gpio_irq_{en,dis}able*()") dropped the configuration of ISEL from struct irq_chip::{irq_enable, irq_disable} APIs and moved it to struct gpio_chip::irq::{child_to_parent_hwirq, child_irq_domain_ops::free} APIs to fix spurious IRQs. After commit 1d2da79 ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in gpio_irq_{en,dis}able*()"), ISEL was no longer configured properly on resume. This is because the pinctrl resume code used struct irq_chip::irq_enable (called from rzg2l_gpio_irq_restore()) to reconfigure the wakeup interrupts. Some drivers (e.g. Ethernet) may also reconfigure non-wakeup interrupts on resume through their own code, eventually calling struct irq_chip::irq_enable. Fix this by adding ISEL configuration back into the struct irq_chip::irq_enable API and on resume path for wakeup interrupts. As struct irq_chip::irq_enable needs now to lock to update the ISEL, convert the struct rzg2l_pinctrl::lock to a raw spinlock and replace the locking API calls with the raw variants. Otherwise the lockdep reports invalid wait context when probing the adv7511 module on RZ/G2L: [ BUG: Invalid wait context ] 6.17.0-rc5-next-20250911-00001-gfcfac22533c9 torvalds#18 Not tainted ----------------------------- (udev-worker)/165 is trying to lock: ffff00000e3664a8 (&pctrl->lock){....}-{3:3}, at: rzg2l_gpio_irq_enable+0x38/0x78 other info that might help us debug this: context-{5:5} 3 locks held by (udev-worker)/165: #0: ffff00000e890108 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x90/0x1ac CachyOS#1: ffff000011c07240 (request_class){+.+.}-{4:4}, at: __setup_irq+0xb4/0x6dc CachyOS#2: ffff000011c070c8 (lock_class){....}-{2:2}, at: __setup_irq+0xdc/0x6dc stack backtrace: CPU: 1 UID: 0 PID: 165 Comm: (udev-worker) Not tainted 6.17.0-rc5-next-20250911-00001-gfcfac22533c9 torvalds#18 PREEMPT Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT) Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __lock_acquire+0xa14/0x20b4 lock_acquire+0x1c8/0x354 _raw_spin_lock_irqsave+0x60/0x88 rzg2l_gpio_irq_enable+0x38/0x78 irq_enable+0x40/0x8c __irq_startup+0x78/0xa4 irq_startup+0x108/0x16c __setup_irq+0x3c0/0x6dc request_threaded_irq+0xec/0x1ac devm_request_threaded_irq+0x80/0x134 adv7511_probe+0x928/0x9a4 [adv7511] i2c_device_probe+0x22c/0x3dc really_probe+0xbc/0x2a0 __driver_probe_device+0x78/0x12c driver_probe_device+0x40/0x164 __driver_attach+0x9c/0x1ac bus_for_each_dev+0x74/0xd0 driver_attach+0x24/0x30 bus_add_driver+0xe4/0x208 driver_register+0x60/0x128 i2c_register_driver+0x48/0xd0 adv7511_init+0x5c/0x1000 [adv7511] do_one_initcall+0x64/0x30c do_init_module+0x58/0x23c load_module+0x1bcc/0x1d40 init_module_from_file+0x88/0xc4 idempotent_init_module+0x188/0x27c __arm64_sys_finit_module+0x68/0xac invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0xc0/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x4c/0x160 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c Having ISEL configuration back into the struct irq_chip::irq_enable API should be safe with respect to spurious IRQs, as in the probe case IRQs are enabled anyway in struct gpio_chip::irq::child_to_parent_hwirq. No spurious IRQs were detected on suspend/resume, boot, ethernet link insert/remove tests (executed on RZ/G3S). Boot, ethernet link insert/remove tests were also executed successfully on RZ/G2L. Fixes: 1d2da79 ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in gpio_irq_{en,dis}able*(") Cc: stable@vger.kernel.org Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250912095308.3603704-1-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
The problem is that scan_fast() allocate memory for ubi->fm and ubi->fm->e[x], but if the following attach process fails in ubi_wl_init or ubi_read_volume_table, the whole attach process will fail without executing ubi_wl_close to free the memory under ubi->fm. Fix this by add a new ubi_free_fastmap function in fastmap.c to free the memory allocated for fm. If SLUB_DEBUG and KUNIT are enabled, the following warning messages will show: ubi0: detaching mtd0 ubi0: mtd0 is detached ubi0: default fastmap pool size: 200 ubi0: default fastmap WL pool size: 100 ubi0: attaching mtd0 ubi0: attached by fastmap ubi0: fastmap pool size: 200 ubi0: fastmap WL pool size: 100 ubi0 error: ubi_wl_init [ubi]: no enough physical eraseblocks (4, need 203) ubi0 error: ubi_attach_mtd_dev [ubi]: failed to attach mtd0, error -28 UBI error: cannot attach mtd0 ================================================================= BUG ubi_wl_entry_slab (Tainted: G B O L ): Objects remaining in ubi_wl_entry_slab on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0xffff2fd23a40cd00 objects=22 used=1 fp=0xffff2fd1d0334fd8 flags=0x883fffc010200(slab|head|section=34|node=0|zone=1|lastcpupid=0x7fff) CPU: 0 PID: 5884 Comm: insmod Tainted: G B O L 5.10.0 CachyOS#1 Hardware name: LS1043A RDB Board (DT) Call trace: dump_backtrace+0x0/0x198 show_stack+0x18/0x28 dump_stack+0xe8/0x15c slab_err+0x94/0xc0 __kmem_cache_shutdown+0x1fc/0x39c kmem_cache_destroy+0x48/0x138 ubi_init+0x1d4/0xf34 [ubi] do_one_initcall+0xb4/0x24c do_init_module+0x4c/0x1dc load_module+0x212c/0x2260 __se_sys_finit_module+0xb4/0xd8 __arm64_sys_finit_module+0x18/0x28 el0_svc_common.constprop.0+0x78/0x1a0 do_el0_svc+0x78/0x90 el0_svc+0x20/0x38 el0_sync_handler+0xf0/0x140 normal+0x3d8/0x400 Object 0xffff2fd1d0334e68 @offset=3688 Allocated in ubi_scan_fastmap+0xf04/0xf40 [ubi] age=80 cpu=0 pid=5884 __slab_alloc.isra.21+0x6c/0xb4 kmem_cache_alloc+0x1e4/0x80c ubi_scan_fastmap+0xf04/0xf40 [ubi] ubi_attach+0x1f0/0x3a8 [ubi] ubi_attach_mtd_dev+0x810/0xbc8 [ubi] ubi_init+0x238/0xf34 [ubi] do_one_initcall+0xb4/0x24c do_init_module+0x4c/0x1dc load_module+0x212c/0x2260 __se_sys_finit_module+0xb4/0xd8 __arm64_sys_finit_module+0x18/0x28 el0_svc_common.constprop.0+0x78/0x1a0 do_el0_svc+0x78/0x90 el0_svc+0x20/0x38 el0_sync_handler+0xf0/0x140 normal+0x3d8/0x400 Link: https://bugzilla.kernel.org/show_bug.cgi?id=220744 Signed-off-by: Liyuan Pang <pangliyuan1@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
As Jiaming Zhang and syzbot reported, there is potential deadlock in
f2fs as below:
Chain exists of:
&sbi->cp_rwsem --> fs_reclaim --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
rlock(&sbi->cp_rwsem);
*** DEADLOCK ***
3 locks held by kswapd0/73:
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline]
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389
CachyOS#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline]
CachyOS#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197
CachyOS#2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890
stack backtrace:
CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
__lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537
f2fs_down_read fs/f2fs/f2fs.h:2278 [inline]
f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline]
f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791
f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867
f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925
f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897
evict+0x504/0x9c0 fs/inode.c:810
f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853
evict+0x504/0x9c0 fs/inode.c:810
dispose_list fs/inode.c:852 [inline]
prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000
super_cache_scan+0x39b/0x4b0 fs/super.c:224
do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437
shrink_slab_memcg mm/shrinker.c:550 [inline]
shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628
shrink_one+0x28a/0x7c0 mm/vmscan.c:4955
shrink_many mm/vmscan.c:5016 [inline]
lru_gen_shrink_node mm/vmscan.c:5094 [inline]
shrink_node+0x315d/0x3780 mm/vmscan.c:6081
kswapd_shrink_node mm/vmscan.c:6941 [inline]
balance_pgdat mm/vmscan.c:7124 [inline]
kswapd+0x147c/0x2800 mm/vmscan.c:7389
kthread+0x70e/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The root cause is deadlock among four locks as below:
kswapd
- fs_reclaim --- Lock A
- shrink_one
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- iput
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- f2fs_truncate
- f2fs_truncate_blocks
- f2fs_do_truncate_blocks
- f2fs_lock_op --- Lock C
ioctl
- f2fs_ioc_commit_atomic_write
- f2fs_lock_op --- Lock C
- __f2fs_commit_atomic_write
- __replace_atomic_write_block
- f2fs_get_dnode_of_data
- __get_node_folio
- f2fs_check_nid_range
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
open
- do_open
- do_truncate
- security_inode_need_killpriv
- f2fs_getxattr
- lookup_all_xattrs
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
- f2fs_commit_super
- read_mapping_folio
- filemap_alloc_folio_noprof
- prepare_alloc_pages
- fs_reclaim_acquire --- Lock A
In order to avoid such deadlock, we need to avoid grabbing sb_lock in
f2fs_handle_error(), so, let's use asynchronous method instead:
- remove f2fs_handle_error() implementation
- rename f2fs_handle_error_async() to f2fs_handle_error()
- spread f2fs_handle_error()
Fixes: 95fa90c ("f2fs: support recording errors into superblock")
Cc: stable@kernel.org
Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
As syzbot reported: F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0] ------------[ cut here ]------------ kernel BUG at fs/f2fs/extent_cache.c:678! Oops: invalid opcode: 0000 [CachyOS#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678 Call Trace: <TASK> f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085 f2fs_do_zero_range fs/f2fs/file.c:1657 [inline] f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737 f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030 vfs_fallocate+0x669/0x7e0 fs/open.c:342 ioctl_preallocate fs/ioctl.c:289 [inline] file_ioctl+0x611/0x780 fs/ioctl.c:-1 do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576 __do_sys_ioctl fs/ioctl.c:595 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f07bc58eec9 In error path of f2fs_zero_range(), it may add a zero-sized extent into extent cache, it should be avoided. Fixes: 6e96194 ("f2fs: support in batch fzero in dnode page") Cc: stable@kernel.org Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
Bai, Shuangpeng <sjb7183@psu.edu> reported a bug as below: Oops: divide error: 0000 [CachyOS#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 11441 Comm: syz.0.46 Not tainted 6.17.0 CachyOS#1 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:f2fs_all_cluster_page_ready+0x106/0x550 fs/f2fs/compress.c:857 Call Trace: <TASK> f2fs_write_cache_pages fs/f2fs/data.c:3078 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3290 [inline] f2fs_write_data_pages+0x1c19/0x3600 fs/f2fs/data.c:3317 do_writepages+0x38e/0x640 mm/page-writeback.c:2634 filemap_fdatawrite_wbc mm/filemap.c:386 [inline] __filemap_fdatawrite_range mm/filemap.c:419 [inline] file_write_and_wait_range+0x2ba/0x3e0 mm/filemap.c:794 f2fs_do_sync_file+0x6e6/0x1b00 fs/f2fs/file.c:294 generic_write_sync include/linux/fs.h:3043 [inline] f2fs_file_write_iter+0x76e/0x2700 fs/f2fs/file.c:5259 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x7e9/0xe00 fs/read_write.c:686 ksys_write+0x19d/0x2d0 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf7/0x470 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The bug was triggered w/ below race condition: fsync setattr ioctl - f2fs_do_sync_file - file_write_and_wait_range - f2fs_write_cache_pages : inode is non-compressed : cc.cluster_size = F2FS_I(inode)->i_cluster_size = 0 - tag_pages_for_writeback - f2fs_setattr - truncate_setsize - f2fs_truncate - f2fs_fileattr_set - f2fs_setflags_common - set_compress_context : F2FS_I(inode)->i_cluster_size = 4 : set_inode_flag(inode, FI_COMPRESSED_FILE) - f2fs_compressed_file : return true - f2fs_all_cluster_page_ready : "pgidx % cc->cluster_size" trigger dividing 0 issue Let's change as below to fix this issue: - introduce a new atomic type variable .writeback in structure f2fs_inode_info to track the number of threads which calling f2fs_write_cache_pages(). - use .i_sem lock to protect .writeback update. - check .writeback before update compression context in f2fs_setflags_common() to avoid race w/ ->writepages. Fixes: 4c8ff70 ("f2fs: support data compression") Cc: stable@kernel.org Reported-by: Bai, Shuangpeng <sjb7183@psu.edu> Tested-by: Bai, Shuangpeng <sjb7183@psu.edu> Closes: https://lore.kernel.org/lkml/44D8F7B3-68AD-425F-9915-65D27591F93F@psu.edu Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
As Hong Yun reported in mailing list: loop7: detected capacity change from 0 to 131072 ------------[ cut here ]------------ kmem_cache of name 'f2fs_xattr_entry-7:7' already exists WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline] WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307 CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 CachyOS#1 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline] RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307 Call Trace: __kmem_cache_create include/linux/slab.h:353 [inline] f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline] f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843 f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918 get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692 vfs_get_tree+0x43/0x140 fs/super.c:1815 do_new_mount+0x201/0x550 fs/namespace.c:3808 do_mount fs/namespace.c:4136 [inline] __do_sys_mount fs/namespace.c:4347 [inline] __se_sys_mount+0x298/0x2f0 fs/namespace.c:4324 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e The bug can be reproduced w/ below scripts: - mount /dev/vdb /mnt1 - mount /dev/vdc /mnt2 - umount /mnt1 - mounnt /dev/vdb /mnt1 The reason is if we created two slab caches, named f2fs_xattr_entry-7:3 and f2fs_xattr_entry-7:7, and they have the same slab size. Actually, slab system will only create one slab cache core structure which has slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same structure and cache address. So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will decrease reference count of slab cache, rather than release slab cache entirely, since there is one more user has referenced the cache. Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again, slab system will find that there is existed cache which has the same name and trigger the warning. Let's changes to use global inline_xattr_slab instead of per-sb slab cache for fixing. Fixes: a999150 ("f2fs: use kmem_cache pool during inline xattr lookups") Cc: stable@kernel.org Reported-by: Hong Yun <yhong@link.cuhk.edu.hk> Tested-by: Hong Yun <yhong@link.cuhk.edu.hk> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
Xfstests generic/335, generic/336 sometimes crash with the following message: F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1 ------------[ cut here ]------------ kernel BUG at fs/f2fs/super.c:1939! Oops: invalid opcode: 0000 [CachyOS#1] SMP NOPTI CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 CachyOS#1 PREEMPT(none) Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:f2fs_put_super+0x3b3/0x3c0 Call Trace: <TASK> generic_shutdown_super+0x7e/0x190 kill_block_super+0x1a/0x40 kill_f2fs_super+0x9d/0x190 deactivate_locked_super+0x30/0xb0 cleanup_mnt+0xba/0x150 task_work_run+0x5c/0xa0 exit_to_user_mode_loop+0xb7/0xc0 do_syscall_64+0x1ae/0x1c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- It appears that sometimes it is possible that f2fs_put_super() is called before all node page reads are completed. Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem. Cc: stable@kernel.org Fixes: 2087258 ("f2fs: fix to drop all dirty meta/node pages during umount()") Signed-off-by: Jan Prusakowski <jprusakowski@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
With below scripts, it will trigger panic in f2fs: mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync echo 111 >> /mnt/f2fs/foo f2fs_io fsync /mnt/f2fs/foo f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs mount -o ro,norecovery /dev/vdd /mnt/f2fs or mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0 F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f F2FS-fs (vdd): Stopped filesystem due to reason: 0 F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1 Filesystem f2fs get_tree() didn't set fc->root, returned 1 ------------[ cut here ]------------ kernel BUG at fs/super.c:1761! Oops: invalid opcode: 0000 [CachyOS#1] SMP PTI CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ torvalds#721 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:vfs_get_tree.cold+0x18/0x1a Call Trace: <TASK> fc_mount+0x13/0xa0 path_mount+0x34e/0xc50 __x64_sys_mount+0x121/0x150 do_syscall_64+0x84/0x800 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa6cc126cfe The root cause is we missed to handle error number returned from f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or ro,disable_roll_forward mount option, result in returning a positive error number to vfs_get_tree(), fix it. Cc: stable@kernel.org Fixes: 6781eab ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pongo1231
pushed a commit
to pongo1231/linux
that referenced
this pull request
Dec 9, 2025
kbd_led_set() can sleep, and so may not be used as the brightness_set() callback. Otherwise using this led with a trigger leads to system hangs accompanied by: BUG: scheduling while atomic: acpi_fakekeyd/2588/0x00000003 CPU: 4 UID: 0 PID: 2588 Comm: acpi_fakekeyd Not tainted 6.17.9+deb14-amd64 CachyOS#1 PREEMPT(lazy) Debian 6.17.9-1 Hardware name: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B9403CVAR/B9403CVAR, BIOS B9403CVAR.311 12/24/2024 Call Trace: <TASK> [...] schedule_timeout+0xbd/0x100 __down_common+0x175/0x290 down_timeout+0x67/0x70 acpi_os_wait_semaphore+0x57/0x90 [...] asus_wmi_evaluate_method3+0x87/0x190 [asus_wmi] led_trigger_event+0x3f/0x60 [...] Fixes: 9fe44fc ("platform/x86: asus-wmi: Simplify the keyboard brightness updating process") Signed-off-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Denis Benato <benato.denis96@gmail.com> Link: https://patch.msgid.link/20251129101307.18085-3-anton@khirnov.net Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.