-
Notifications
You must be signed in to change notification settings - Fork 146
perf: stop using deprecated bpf_program__title() #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Master branch: bc0b5a0 patch https://patchwork.ozlabs.org/project/netdev/patch/20200908180127.1249-1-andriin@fb.com/ applied successfully |
bpf_program__section_name(). Also drop unnecessary error checks because neither bpf_program__title() nor bpf_program__section_name() can fail or return NULL. Fixes: 5210958 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"") Signed-off-by: Andrii Nakryiko <andriin@fb.com> --- tools/perf/util/bpf-loader.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-)
|
Master branch: e6054fc patch https://patchwork.ozlabs.org/project/netdev/patch/20200908180127.1249-1-andriin@fb.com/ applied successfully |
4b8ccb5 to
920f49b
Compare
|
At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=200279 irrelevant now. Closing PR. |
This fix is for a failure that occurred in the DWARF unwind perf test.
Stack unwinders may probe memory when looking for frames.
Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.
This can lead to false memory sanitizer failures for the use of an
uninitialized value.
Avoid this problem by removing the poison on the copied stack.
The full msan failure with track origins looks like:
==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#23 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
#1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
#2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#24 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#23 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
#1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
#2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
#3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#25 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
#1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
#2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
#3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#17 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
#0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445
SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When using kprobe on powerpc booke series processor, Oops happens as show bellow: / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable / # sleep 1 [ 50.076730] Oops: Exception in kernel mode, sig: 5 [#1] [ 50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 [ 50.077221] Modules linked in: [ 50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d #21 [ 50.077887] NIP: c0b9c4e0 LR: c00ebecc CTR: 00000000 [ 50.078067] REGS: c3883de0 TRAP: 0700 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.078349] MSR: 00029000 <CE,EE,ME> CR: 24000228 XER: 20000000 [ 50.078675] [ 50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001 [ 50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4 [ 50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000 [ 50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190 [ 50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0 [ 50.080638] Call Trace: [ 50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable) [ 50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110 [ 50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28 [ 50.081541] --- interrupt: c00 at 0x100a4d08 [ 50.081749] NIP: 100a4d08 LR: 101b5234 CTR: 00000003 [ 50.081931] REGS: c3883f50 TRAP: 0c00 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.082183] MSR: 0002f902 <CE,EE,PR,FP,ME> CR: 24000222 XER: 00000000 [ 50.082457] [ 50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff [ 50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4 [ 50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8 [ 50.083789] NIP [100a4d08] 0x100a4d08 [ 50.083917] LR [101b5234] 0x101b5234 [ 50.084042] --- interrupt: c00 [ 50.084238] Instruction dump: [ 50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010 [ 50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048 [ 50.085487] ---[ end trace f6fffe98e2fa8f3e ]--- [ 50.085678] Trace/breakpoint trap There is no real mode for booke arch and the MMU translation is always on. The corresponding MSR_IS/MSR_DS bit in booke is used to switch the address space, but not for real mode judgment. Fixes: 21f8b2f ("powerpc/kprobes: Ignore traps that happened in real mode") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com
Commit c240ba2 ("selftests/bpf: Add a test with a bpf program with btf_tag attributes") added btf_tag selftest to test BTF_KIND_TAG generation from C source code, and to test kernel validation of generated BTF types. But if an old clang (clang 13 or earlier) is used, the following compiler warning may be seen: progs/tag.c:23:20: warning: unknown attribute 'btf_tag' ignored and the test itself is marked OK. The compiler warning is bad and the test itself shouldn't be marked OK. This patch added the check for btf_tag attribute support. If btf_tag is not supported by the clang, the attribute will not be used in the code and the test will be marked as skipped. For example, with clang 13: ./test_progs -t btf_tag #21 btf_tag:SKIP Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED The selftests/README.rst is updated to clarify when the btf_tag test may be skipped. Signed-off-by: Yonghong Song <yhs@fb.com>
Commit c240ba2 ("selftests/bpf: Add a test with a bpf program with btf_tag attributes") added btf_tag selftest to test BTF_KIND_TAG generation from C source code, and to test kernel validation of generated BTF types. But if an old clang (clang 13 or earlier) is used, the following compiler warning may be seen: progs/tag.c:23:20: warning: unknown attribute 'btf_tag' ignored and the test itself is marked OK. The compiler warning is bad and the test itself shouldn't be marked OK. This patch added the check for btf_tag attribute support. If btf_tag is not supported by the clang, the attribute will not be used in the code and the test will be marked as skipped. For example, with clang 13: ./test_progs -t btf_tag #21 btf_tag:SKIP Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED The selftests/README.rst is updated to clarify when the btf_tag test may be skipped. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210915061036.2577971-1-yhs@fb.com
The id of slv_cnoc_mnoc_cfg node is mistakenly coded as id of slv_blsp_1. It causes the following warning on slv_blsp_1 node adding. Correct the id of slv_cnoc_mnoc_cfg node. [ 1.948180] ------------[ cut here ]------------ [ 1.954122] WARNING: CPU: 2 PID: 7 at drivers/interconnect/core.c:962 icc_node_add+0xe4/0xf8 [ 1.958994] Modules linked in: [ 1.967399] CPU: 2 PID: 7 Comm: kworker/u16:0 Not tainted 5.14.0-rc6-next-20210818 #21 [ 1.970275] Hardware name: Xiaomi Redmi Note 7 (DT) [ 1.978169] Workqueue: events_unbound deferred_probe_work_func [ 1.982945] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1.988849] pc : icc_node_add+0xe4/0xf8 [ 1.995699] lr : qnoc_probe+0x350/0x438 [ 1.999519] sp : ffff80001008bb10 [ 2.003337] x29: ffff80001008bb10 x28: 000000000000001a x27: ffffb83ddc61ee28 [ 2.006818] x26: ffff2fe341d44080 x25: ffff2fe340f3aa80 x24: ffffb83ddc98f0e8 [ 2.013938] x23: 0000000000000024 x22: ffff2fe3408b7400 x21: 0000000000000000 [ 2.021054] x20: ffff2fe3408b7410 x19: ffff2fe341d44080 x18: 0000000000000010 [ 2.028173] x17: ffff2fe3bdd0aac0 x16: 0000000000000281 x15: ffff2fe3400f5528 [ 2.035290] x14: 000000000000013f x13: ffff2fe3400f5528 x12: 00000000ffffffea [ 2.042410] x11: ffffb83ddc9109d0 x10: ffffb83ddc8f8990 x9 : ffffb83ddc8f89e8 [ 2.049527] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 [ 2.056645] x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : ffffb83ddc9903b0 [ 2.063764] x2 : 1a1f6fde34d45500 x1 : ffff2fe340f3a880 x0 : ffff2fe340f3a880 [ 2.070882] Call trace: [ 2.077989] icc_node_add+0xe4/0xf8 [ 2.080247] qnoc_probe+0x350/0x438 [ 2.083718] platform_probe+0x68/0xd8 [ 2.087191] really_probe+0xb8/0x300 [ 2.091011] __driver_probe_device+0x78/0xe0 [ 2.094659] driver_probe_device+0x80/0x110 [ 2.098911] __device_attach_driver+0x90/0xe0 [ 2.102818] bus_for_each_drv+0x78/0xc8 [ 2.107331] __device_attach+0xf0/0x150 [ 2.110977] device_initial_probe+0x14/0x20 [ 2.114796] bus_probe_device+0x9c/0xa8 [ 2.118963] deferred_probe_work_func+0x88/0xc0 [ 2.122784] process_one_work+0x1a4/0x338 [ 2.127296] worker_thread+0x1f8/0x420 [ 2.131464] kthread+0x150/0x160 [ 2.135107] ret_from_fork+0x10/0x20 [ 2.138495] ---[ end trace 5eea8768cb620e87 ]--- Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Fixes: f80a1d4 ("interconnect: qcom: Add SDM660 interconnect provider driver") Link: https://lore.kernel.org/r/20210823014003.31391-1-shawn.guo@linaro.org Signed-off-by: Georgi Djakov <djakov@kernel.org>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
...
[14] TYPEDEF 'value_t' type_id=17
[15] DECL_TAG 'tag1' type_id=14 component_idx=-1
[16] DECL_TAG 'tag2' type_id=14 component_idx=-1
[17] STRUCT '(anon)' size=8 vlen=2
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
...
The btf_tag selftest also succeeded:
$ ./test_progs -t tag
#21 btf_tag:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211021195643.4020315-1-yhs@fb.com
Attempting to defragment a Btrfs file containing a transparent huge page immediately deadlocks with the following stack trace: #0 context_switch (kernel/sched/core.c:4940:2) #1 __schedule (kernel/sched/core.c:6287:8) #2 schedule (kernel/sched/core.c:6366:3) #3 io_schedule (kernel/sched/core.c:8389:2) #4 wait_on_page_bit_common (mm/filemap.c:1356:4) #5 __lock_page (mm/filemap.c:1648:2) #6 lock_page (./include/linux/pagemap.h:625:3) #7 pagecache_get_page (mm/filemap.c:1910:4) #8 find_or_create_page (./include/linux/pagemap.h:420:9) #9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9) #10 defrag_one_range (fs/btrfs/ioctl.c:1326:14) #11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9) #12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9) #13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9) #14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10) #15 vfs_ioctl (fs/ioctl.c:51:10) #16 __do_sys_ioctl (fs/ioctl.c:874:11) #17 __se_sys_ioctl (fs/ioctl.c:860:1) #18 __x64_sys_ioctl (fs/ioctl.c:860:1) #19 do_syscall_x64 (arch/x86/entry/common.c:50:14) #20 do_syscall_64 (arch/x86/entry/common.c:80:7) #21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113) A huge page is represented by a compound page, which consists of a struct page for each PAGE_SIZE page within the huge page. The first struct page is the "head page", and the remaining are "tail pages". Defragmentation attempts to lock each page in the range. However, lock_page() on a tail page actually locks the corresponding head page. So, if defragmentation tries to lock more than one struct page in a compound page, it tries to lock the same head page twice and deadlocks with itself. Ideally, we should be able to defragment transparent huge pages. However, THP for filesystems is currently read-only, so a lot of code is not ready to use huge pages for I/O. For now, let's just return ETXTBUSY. This can be reproduced with the following on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y: $ cat create_thp_file.c #include <fcntl.h> #include <stdbool.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> static const char zeroes[1024 * 1024]; static const size_t FILE_SIZE = 2 * 1024 * 1024; int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } int fd = creat(argv[1], 0777); if (fd == -1) { perror("creat"); return EXIT_FAILURE; } size_t written = 0; while (written < FILE_SIZE) { ssize_t ret = write(fd, zeroes, sizeof(zeroes) < FILE_SIZE - written ? sizeof(zeroes) : FILE_SIZE - written); if (ret < 0) { perror("write"); return EXIT_FAILURE; } written += ret; } close(fd); fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } /* * Reserve some address space so that we can align the file mapping to * the huge page size. */ void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (placeholder_map == MAP_FAILED) { perror("mmap (placeholder)"); return EXIT_FAILURE; } void *aligned_address = (void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1)); void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED | MAP_FIXED, fd, 0); if (map == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) { perror("madvise"); return EXIT_FAILURE; } char *line = NULL; size_t line_capacity = 0; FILE *smaps_file = fopen("/proc/self/smaps", "r"); if (!smaps_file) { perror("fopen"); return EXIT_FAILURE; } for (;;) { for (size_t off = 0; off < FILE_SIZE; off += 4096) ((volatile char *)map)[off]; ssize_t ret; bool this_mapping = false; while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) { unsigned long start, end, huge; if (sscanf(line, "%lx-%lx", &start, &end) == 2) { this_mapping = (start <= (uintptr_t)map && (uintptr_t)map < end); } else if (this_mapping && sscanf(line, "FilePmdMapped: %ld", &huge) == 1 && huge > 0) { return EXIT_SUCCESS; } } sleep(6); rewind(smaps_file); fflush(smaps_file); } } $ ./create_thp_file huge $ btrfs fi defrag -czstd ./huge Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
commit 133466c ("net: stmmac: use per-queue 64 bit statistics where necessary") caused one regression as found by Uwe, the backtrace looks like: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc1-00449-g133466c3bbe1-dirty #21 Hardware name: STM32 (Device Tree Support) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x60/0x90 dump_stack_lvl from register_lock_class+0x98c/0x99c register_lock_class from __lock_acquire+0x74/0x293c __lock_acquire from lock_acquire+0x134/0x398 lock_acquire from stmmac_get_stats64+0x2ac/0x2fc stmmac_get_stats64 from dev_get_stats+0x44/0x130 dev_get_stats from rtnl_fill_stats+0x38/0x120 rtnl_fill_stats from rtnl_fill_ifinfo+0x834/0x17f4 rtnl_fill_ifinfo from rtmsg_ifinfo_build_skb+0xc0/0x144 rtmsg_ifinfo_build_skb from rtmsg_ifinfo+0x50/0x88 rtmsg_ifinfo from __dev_notify_flags+0xc0/0xec __dev_notify_flags from dev_change_flags+0x50/0x5c dev_change_flags from ip_auto_config+0x2f4/0x1260 ip_auto_config from do_one_initcall+0x70/0x35c do_one_initcall from kernel_init_freeable+0x2ac/0x308 kernel_init_freeable from kernel_init+0x1c/0x138 kernel_init from ret_from_fork+0x14/0x2c The reason is the rxq|txq_stats structures are not what expected because stmmac_open() -> __stmmac_open() the structure is overwritten by "memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf));" This causes the well initialized syncp member of rxq|txq_stats is overwritten unexpectedly as pointed out by Johannes and Uwe. Fix this issue by moving rxq|txq_stats back to stmmac_extra_stats. For SMP cache friendly, we also mark stmmac_txq_stats and stmmac_rxq_stats as ____cacheline_aligned_in_smp. Fixes: 133466c ("net: stmmac: use per-queue 64 bit statistics where necessary") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20230917165328.3403-1-jszhang@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add test cases to test the race between the destroy of inner map due to map-in-map update and the access of inner map in bpf program. The following 4 combination are added: (1) array map in map array + bpf program (2) array map in map array + sleepable bpf program (3) array map in map htab + bpf program (4) array map in map htab + sleepable bpf program Before apply the fixes, when running "./test_prog -a map_in_map" with net.core.bpf_jit_enable=0, the following error was reported: BUG: KASAN: slab-use-after-free in bpf_map_lookup_elem+0x25/0x60 Read of size 8 at addr ffff888162fbe000 by task test_progs/3282 CPU: 4 PID: 3282 Comm: test_progs Not tainted 6.6.0-rc5+ #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... Call Trace: <TASK> dump_stack_lvl+0x4b/0x80 print_report+0xcf/0x610 kasan_report+0x9d/0xd0 __asan_load8+0x7e/0xb0 bpf_map_lookup_elem+0x25/0x60 ___bpf_prog_run+0x2569/0x3c50 __bpf_prog_run32+0xa1/0xe0 trace_call_bpf+0x1a9/0x5e0 kprobe_perf_func+0xce/0x450 kprobe_dispatcher+0xa1/0xb0 kprobe_ftrace_handler+0x27b/0x370 0xffffffffc02080f7 RIP: 0010:__x64_sys_getpgid+0x1/0x30 ...... </TASK> Allocated by task 3281: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_alloc_info+0x1b/0x30 __kasan_kmalloc+0x84/0xa0 __kmalloc_node+0x67/0x170 __bpf_map_area_alloc+0x13f/0x160 bpf_map_area_alloc+0x10/0x20 array_map_alloc+0x11d/0x2c0 map_create+0x285/0xc30 __sys_bpf+0xcff/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 1328: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x10f/0x1a0 __kmem_cache_free+0x1df/0x460 kfree+0x90/0x140 kvfree+0x2c/0x40 bpf_map_area_free+0xe/0x20 array_map_free+0x11f/0x270 bpf_map_free_deferred+0xda/0x200 process_scheduled_works+0x689/0xa20 worker_thread+0x2fd/0x5a0 kthread+0x1bf/0x200 ret_from_fork+0x39/0x70 ret_from_fork_asm+0x1b/0x30 Last potentially related work creation: kasan_save_stack+0x26/0x50 __kasan_record_aux_stack+0x92/0xa0 kasan_record_aux_stack_noalloc+0xb/0x20 insert_work+0x2a/0xc0 __queue_work+0x2a6/0x8d0 queue_work_on+0x7c/0x80 __bpf_map_put+0x103/0x140 bpf_map_put+0x10/0x20 bpf_map_fd_put_ptr+0x1e/0x30 bpf_fd_array_map_update_elem+0x18a/0x1d0 bpf_map_update_value+0x2ca/0x4b0 __sys_bpf+0x26ba/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Hou Tao <houtao1@huawei.com>
Add test cases to test the race between the destroy of inner map due to map-in-map update and the access of inner map in bpf program. The following 4 combination are added: (1) array map in map array + bpf program (2) array map in map array + sleepable bpf program (3) array map in map htab + bpf program (4) array map in map htab + sleepable bpf program Before apply the fixes, when running "./test_prog -a map_in_map" with net.core.bpf_jit_enable=0, the following error was reported: BUG: KASAN: slab-use-after-free in bpf_map_lookup_elem+0x25/0x60 Read of size 8 at addr ffff888162fbe000 by task test_progs/3282 CPU: 4 PID: 3282 Comm: test_progs Not tainted 6.6.0-rc5+ #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... Call Trace: <TASK> dump_stack_lvl+0x4b/0x80 print_report+0xcf/0x610 kasan_report+0x9d/0xd0 __asan_load8+0x7e/0xb0 bpf_map_lookup_elem+0x25/0x60 ___bpf_prog_run+0x2569/0x3c50 __bpf_prog_run32+0xa1/0xe0 trace_call_bpf+0x1a9/0x5e0 kprobe_perf_func+0xce/0x450 kprobe_dispatcher+0xa1/0xb0 kprobe_ftrace_handler+0x27b/0x370 0xffffffffc02080f7 RIP: 0010:__x64_sys_getpgid+0x1/0x30 ...... </TASK> Allocated by task 3281: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_alloc_info+0x1b/0x30 __kasan_kmalloc+0x84/0xa0 __kmalloc_node+0x67/0x170 __bpf_map_area_alloc+0x13f/0x160 bpf_map_area_alloc+0x10/0x20 array_map_alloc+0x11d/0x2c0 map_create+0x285/0xc30 __sys_bpf+0xcff/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 1328: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x10f/0x1a0 __kmem_cache_free+0x1df/0x460 kfree+0x90/0x140 kvfree+0x2c/0x40 bpf_map_area_free+0xe/0x20 array_map_free+0x11f/0x270 bpf_map_free_deferred+0xda/0x200 process_scheduled_works+0x689/0xa20 worker_thread+0x2fd/0x5a0 kthread+0x1bf/0x200 ret_from_fork+0x39/0x70 ret_from_fork_asm+0x1b/0x30 Last potentially related work creation: kasan_save_stack+0x26/0x50 __kasan_record_aux_stack+0x92/0xa0 kasan_record_aux_stack_noalloc+0xb/0x20 insert_work+0x2a/0xc0 __queue_work+0x2a6/0x8d0 queue_work_on+0x7c/0x80 __bpf_map_put+0x103/0x140 bpf_map_put+0x10/0x20 bpf_map_fd_put_ptr+0x1e/0x30 bpf_fd_array_map_update_elem+0x18a/0x1d0 bpf_map_update_value+0x2ca/0x4b0 __sys_bpf+0x26ba/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Hou Tao <houtao1@huawei.com>
Add test cases to test the race between the destroy of inner map due to map-in-map update and the access of inner map in bpf program. The following 4 combination are added: (1) array map in map array + bpf program (2) array map in map array + sleepable bpf program (3) array map in map htab + bpf program (4) array map in map htab + sleepable bpf program Before apply the fixes, when running "./test_prog -a map_in_map" with net.core.bpf_jit_enable=0, the following error was reported: BUG: KASAN: slab-use-after-free in bpf_map_lookup_elem+0x25/0x60 Read of size 8 at addr ffff888162fbe000 by task test_progs/3282 CPU: 4 PID: 3282 Comm: test_progs Not tainted 6.6.0-rc5+ #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... Call Trace: <TASK> dump_stack_lvl+0x4b/0x80 print_report+0xcf/0x610 kasan_report+0x9d/0xd0 __asan_load8+0x7e/0xb0 bpf_map_lookup_elem+0x25/0x60 ___bpf_prog_run+0x2569/0x3c50 __bpf_prog_run32+0xa1/0xe0 trace_call_bpf+0x1a9/0x5e0 kprobe_perf_func+0xce/0x450 kprobe_dispatcher+0xa1/0xb0 kprobe_ftrace_handler+0x27b/0x370 0xffffffffc02080f7 RIP: 0010:__x64_sys_getpgid+0x1/0x30 ...... </TASK> Allocated by task 3281: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_alloc_info+0x1b/0x30 __kasan_kmalloc+0x84/0xa0 __kmalloc_node+0x67/0x170 __bpf_map_area_alloc+0x13f/0x160 bpf_map_area_alloc+0x10/0x20 array_map_alloc+0x11d/0x2c0 map_create+0x285/0xc30 __sys_bpf+0xcff/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 1328: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x10f/0x1a0 __kmem_cache_free+0x1df/0x460 kfree+0x90/0x140 kvfree+0x2c/0x40 bpf_map_area_free+0xe/0x20 array_map_free+0x11f/0x270 bpf_map_free_deferred+0xda/0x200 process_scheduled_works+0x689/0xa20 worker_thread+0x2fd/0x5a0 kthread+0x1bf/0x200 ret_from_fork+0x39/0x70 ret_from_fork_asm+0x1b/0x30 Last potentially related work creation: kasan_save_stack+0x26/0x50 __kasan_record_aux_stack+0x92/0xa0 kasan_record_aux_stack_noalloc+0xb/0x20 insert_work+0x2a/0xc0 __queue_work+0x2a6/0x8d0 queue_work_on+0x7c/0x80 __bpf_map_put+0x103/0x140 bpf_map_put+0x10/0x20 bpf_map_fd_put_ptr+0x1e/0x30 bpf_fd_array_map_update_elem+0x18a/0x1d0 bpf_map_update_value+0x2ca/0x4b0 __sys_bpf+0x26ba/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Hou Tao <houtao1@huawei.com>
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
… non head_frag
The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.
When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.
call stack:
...
[exception RIP: skb_segment+3016]
RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...
The skb has the following properties:
doffset = 66
list_skb = skb_shinfo(skb)->frag_list
list_skb->head_frag = true
skb->len = 2441 && skb->data_len = 2250
skb_shinfo(skb)->nr_frags = 17
skb_shinfo(skb)->gso_size = 75
skb_shinfo(skb)->frags[0...16].bv_len = 125
list_skb->len = 125
list_skb->data_len = 0
3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963 netdev_features_t features)
3964 {
3965 struct sk_buff *segs = NULL;
3966 struct sk_buff *tail = NULL;
...
4181 while (pos < offset + len) {
4182 if (i >= nfrags) {
4183 i = 0;
4184 nfrags = skb_shinfo(list_skb)->nr_frags;
4185 frag = skb_shinfo(list_skb)->frags;
4186 frag_skb = list_skb;
After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.
Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.
In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.
Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
ui_browser__show() is capturing the input title that is stack allocated
memory in hist_browser__run().
Avoid a use after return by strdup-ing the string.
Committer notes:
Further explanation from Ian Rogers:
My command line using tui is:
$ sudo bash -c 'rm /tmp/asan.log*; export
ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a
sleep 1; /tmp/perf/perf mem report'
I then go to the perf annotate view and quit. This triggers the asan
error (from the log file):
```
==1254591==ERROR: AddressSanitizer: stack-use-after-return on address
0x7f2813331920 at pc 0x7f28180
65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10
READ of size 80 at 0x7f2813331920 thread T0
#0 0x7f2818065990 in __interceptor_strlen
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461
#1 0x7f2817698251 in SLsmg_write_wrapped_string
(/lib/x86_64-linux-gnu/libslang.so.2+0x98251)
#2 0x7f28176984b9 in SLsmg_write_nstring
(/lib/x86_64-linux-gnu/libslang.so.2+0x984b9)
#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60
#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266
#5 0x55c94045c776 in ui_browser__show ui/browser.c:288
#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206
#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458
#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412
#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527
#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613
#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661
#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671
#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141
#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805
#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374
#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516
#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350
#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403
#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447
#20 0x55c9400e53ad in main tools/perf/perf.c:561
#21 0x7f28170456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360
#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId:
84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93)
Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame
#0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746
This frame has 1 object(s):
[32, 192) 'title' (line 747) <== Memory access at offset 32 is
inside this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
```
hist_browser__run isn't on the stack so the asan error looks legit.
There's no clean init/exit on struct ui_browser so I may be trading a
use-after-return for a memory leak, but that seems look a good trade
anyway.
Fixes: 05e8b08 ("perf ui browser: Stop using 'self'")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If function-like macros do not utilize a parameter, it might result in a
build warning. In our coding style guidelines, we advocate for utilizing
static inline functions to replace such macros. This patch verifies
compliance with the new rule.
For a macro such as the one below,
#define test(a) do { } while (0)
The test result is as follows.
WARNING: Argument 'a' is not used in function-like macro
#21: FILE: mm/init-mm.c:20:
+#define test(a) do { } while (0)
total: 0 errors, 1 warnings, 8 lines checked
Link: https://lkml.kernel.org/r/20240507032757.146386-3-21cnbao@gmail.com
Signed-off-by: Xining Xu <mac.xxn@outlook.com>
Tested-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Brown <broonie@kernel.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Charlemagne Lasse <charlemagnelasse@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 kernel-patches#1 __crash_kexec at ffffffff8c1338fa kernel-patches#2 panic at ffffffff8c1d69b9 kernel-patches#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] kernel-patches#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] kernel-patches#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] kernel-patches#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] kernel-patches#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] kernel-patches#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] kernel-patches#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] kernel-patches#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] kernel-patches#11 dio_complete at ffffffff8c2b9fa7 kernel-patches#12 do_blockdev_direct_IO at ffffffff8c2bc09f kernel-patches#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] kernel-patches#14 generic_file_direct_write at ffffffff8c1dcf14 kernel-patches#15 __generic_file_write_iter at ffffffff8c1dd07b kernel-patches#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] kernel-patches#17 aio_write at ffffffff8c2cc72e kernel-patches#18 kmem_cache_alloc at ffffffff8c248dde kernel-patches#19 do_io_submit at ffffffff8c2ccada kernel-patches#20 do_syscall_64 at ffffffff8c004984 kernel-patches#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the buffered write path, the dirty page owns the qgroup reserve until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 Call Trace: <TASK> ? close_ctree+0x222/0x4d0 [btrfs] ? __warn.cold+0x8e/0xea ? close_ctree+0x222/0x4d0 [btrfs] ? report_bug+0xff/0x140 ? handle_bug+0x3b/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? close_ctree+0x222/0x4d0 [btrfs] generic_shutdown_super+0x70/0x160 kill_anon_super+0x11/0x40 btrfs_kill_super+0x11/0x20 [btrfs] deactivate_locked_super+0x2e/0xa0 cleanup_mnt+0xb5/0x150 task_work_run+0x57/0x80 syscall_exit_to_user_mode+0x121/0x130 do_syscall_64+0xab/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f916847a887 ---[ end trace 0000000000000000 ]--- BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:
```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
#0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x564d3400d10b in map__get util/map.h:186
#2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
#3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
#4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
#5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
#6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
#7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
#8 0x564d3403ba52 in do_flush util/ordered-events.c:245
#9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
#10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
#11 0x564d34032480 in perf_session__process_event util/session.c:1877
#12 0x564d340336ad in reader__read_event util/session.c:2399
#13 0x564d34033fdc in reader__process_events util/session.c:2448
#14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
#15 0x564d34033fdc in perf_session__process_events util/session.c:2661
#16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
#17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
#18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
#19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
#20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
#21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```
Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.
Committer testing:
$ sudo dnf install libasan
$ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
…s_lock For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: kjain@linux.ibm.com Fixes: af28141 ("block: freeze the queue in queue_attr_store") Tested-by: kjain@linux.ibm.com Cc: hch@lst.de Cc: axboe@kernel.dk Cc: ritesh.list@gmail.com Cc: ming.lei@redhat.com Cc: gjoyce@linux.ibm.com Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
[Why & How] Fix a false positive warning which occurs due to lack of correct checks when querying plane_id in DML21. This fixes the warning when performing a mode1 reset (cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover): [ 35.751250] WARNING: CPU: 11 PID: 326 at /tmp/amd.PHpyAl7v/amd/amdgpu/../display/dc/dml2/dml2_dc_resource_mgmt.c:91 dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751434] Modules linked in: amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit rfcomm qrtr cmac algif_hash algif_skcipher af_alg bnep amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_amd snd_hda_core snd_hwdep snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi crct10dif_pclmul polyval_clmulni polyval_generic btusb ghash_clmulni_intel sha256_ssse3 btrtl sha1_ssse3 snd_seq btintel aesni_intel btbcm btmtk snd_seq_device crypto_simd sunrpc cryptd bluetooth snd_timer ccp binfmt_misc rapl snd i2c_piix4 wmi_bmof gigabyte_wmi k10temp i2c_smbus soundcore gpio_amdpt mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul igc ahci xhci_pci libahci xhci_pci_renesas video wmi [ 35.751501] CPU: 11 UID: 0 PID: 326 Comm: kworker/u64:9 Tainted: G OE 6.11.0-21-generic kernel-patches#21~24.04.1-Ubuntu [ 35.751504] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 35.751505] Hardware name: Gigabyte Technology Co., Ltd. X670E AORUS PRO X/X670E AORUS PRO X, BIOS F30 05/22/2024 [ 35.751506] Workqueue: amdgpu-reset-dev amdgpu_debugfs_reset_work [amdgpu] [ 35.751638] RIP: 0010:dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751794] Code: 6d 0c 00 00 8b 84 24 88 00 00 00 41 3b 44 9c 20 0f 84 fc 07 00 00 48 83 c3 01 48 83 fb 06 75 b3 4c 8b 64 24 68 4c 8b 6c 24 40 <0f> 0b b8 06 00 00 00 49 8b 94 24 a0 49 00 00 89 c3 83 f8 07 0f 87 [ 35.751796] RSP: 0018:ffffbfa3805d7680 EFLAGS: 00010246 [ 35.751798] RAX: 0000000000010000 RBX: 0000000000000006 RCX: 0000000000000000 [ 35.751799] RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 [ 35.751800] RBP: ffffbfa3805d78f0 R08: 0000000000000000 R09: 0000000000000000 [ 35.751801] R10: 0000000000000000 R11: 0000000000000000 R12: ffffbfa383249000 [ 35.751802] R13: ffffa0e68f280000 R14: ffffbfa383249658 R15: 0000000000000000 [ 35.751803] FS: 0000000000000000(0000) GS:ffffa0edbe580000(0000) knlGS:0000000000000000 [ 35.751804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.751805] CR2: 00005d847ef96c58 CR3: 000000041de3e000 CR4: 0000000000f50ef0 [ 35.751806] PKRU: 55555554 [ 35.751807] Call Trace: [ 35.751810] <TASK> [ 35.751816] ? show_regs+0x6c/0x80 [ 35.751820] ? __warn+0x88/0x140 [ 35.751822] ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751964] ? report_bug+0x182/0x1b0 [ 35.751969] ? handle_bug+0x6e/0xb0 [ 35.751972] ? exc_invalid_op+0x18/0x80 [ 35.751974] ? asm_exc_invalid_op+0x1b/0x20 [ 35.751978] ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.752117] ? math_pow+0x48/0xa0 [amdgpu] [ 35.752256] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752260] ? math_pow+0x48/0xa0 [amdgpu] [ 35.752400] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752403] ? math_pow+0x11/0xa0 [amdgpu] [ 35.752524] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752526] ? core_dcn4_mode_programming+0xe4d/0x20d0 [amdgpu] [ 35.752663] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752669] dml21_validate+0x3d4/0x980 [amdgpu] Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f8ad62c)
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:
$ perf record -a -g --call-graph=dwarf
$ perf report
# hung
`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`
$ strace -y -f -p 2780484
strace: Process 2780484 attached
pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached
It's call trace descends into `elfutils`:
$ gdb -p 2780484
(gdb) bt
#0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
at ../sysdeps/unix/sysv/linux/pread64.c:25
#1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
#2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
at util/dso.h:537
#6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
#7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
#8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
best_effort=best_effort@entry=false) at util/thread.h:152
#13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
#14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
max_stack=127, symbols=true) at util/machine.c:2920
#15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
at util/machine.c:2970
#16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
#17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
#18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
#19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
#20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
#21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
#22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
#23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
file_path=0x105ffbf0 "perf.data") at util/session.c:1419
#24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
--Type <RET> for more, q to quit, c to continue without paging--
quit
prog=prog@entry=0x7fff9df81220) at util/session.c:2132
#25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
at util/session.c:2181
#26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
#27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
#28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
at perf.c:351
#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556
The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.
The change conservatively skips all non-regular files.
Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250505174419.2814857-1-slyich@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.
Example output:
```
$ perf test -vv PERF_RECORD_
...
7: PERF_RECORD_* events & perf_sample fields:
7: PERF_RECORD_* events & perf_sample fields : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
: 7: PERF_RECORD_* events & perf_sample fields:
---- unexpected signal (2) ----
#0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
#1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
#2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
#3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
#4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
#5 0x7fc12ff02d68 in __sleep sleep.c:55
#6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
#7 0x55788c620fb0 in run_test_child builtin-test.c:0
#8 0x55788c5bd18d in start_command run-command.c:127
#9 0x55788c621ef3 in __cmd_test builtin-test.c:0
#10 0x55788c6225bf in cmd_test ??:0
#11 0x55788c5afbd0 in run_builtin perf.c:0
#12 0x55788c5afeeb in handle_internal_command perf.c:0
#13 0x55788c52b383 in main ??:0
#14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
#15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#16 0x55788c52b9d1 in _start ??:0
---- unexpected signal (2) ----
#0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
#1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
#2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
#3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
#4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
#5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
#6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
#7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
#8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
#9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
#10 0x7fc12ff02d68 in __sleep sleep.c:55
#11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
#12 0x55788c620fb0 in run_test_child builtin-test.c:0
#13 0x55788c5bd18d in start_command run-command.c:127
#14 0x55788c621ef3 in __cmd_test builtin-test.c:0
#15 0x55788c6225bf in cmd_test ??:0
#16 0x55788c5afbd0 in run_builtin perf.c:0
#17 0x55788c5afeeb in handle_internal_command perf.c:0
#18 0x55788c52b383 in main ??:0
#19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
#20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#21 0x55788c52b9d1 in _start ??:0
7: PERF_RECORD_* events & perf_sample fields : Skip (permissions)
```
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250624210500.2121303-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking kernel-patches#2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 kernel-patches#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking kernel-patches#2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 kernel-patches#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking kernel-patches#2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 kernel-patches#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking kernel-patches#2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 kernel-patches#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking kernel-patches#2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 kernel-patches#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Pull request for series with
subject: perf: stop using deprecated bpf_program__title()
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=200279