perf: Support the deferred unwinding infrastructure #5545

kernel-patches-daemon-bpf-rc · 2025-07-01T18:11:44Z

Pull request for series with
subject: perf: Support the deferred unwinding infrastructure
version: 12
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859

kernel-patches-daemon-bpf-rc · 2025-07-01T18:11:45Z

Upstream branch: c4b1be9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-01T19:24:21Z

Upstream branch: c4b1be9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-01T19:39:43Z

Upstream branch: 26d0e53
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-01T19:44:36Z

Upstream branch: 0df1a55
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-01T19:49:22Z

Upstream branch: cce3fee
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-01T20:34:42Z

Upstream branch: 1230be8
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-01T20:54:47Z

Upstream branch: 212ec92
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-02T15:01:04Z

Upstream branch: 621af19
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-02T15:47:14Z

Upstream branch: 564606f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

kernel-patches-daemon-bpf-rc · 2025-07-02T18:47:38Z

Upstream branch: 38d95be
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

The 'init_nr' argument has double duty: it's used to initialize both the number of contexts and the number of stack entries. That's confusing and the callers always pass zero anyway. Hard code the zero. Acked-by: Namhyung Kim <Namhyung@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

… set get_perf_callchain() doesn't support cross-task unwinding for user space stacks, have it return NULL if both the crosstask and user arguments are set. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

…nt->mm == NULL To determine if a task is a kernel thread or not, it is more reliable to use (current->flags & (PF_KTHREAD|PF_USER_WORKERi)) than to rely on current->mm being NULL. That is because some kernel tasks (io_uring helpers) may have a mm field. Link: https://lore.kernel.org/linux-trace-kernel/20250424163607.GE18306@noisy.programming.kicks-ass.net/ Link: https://lore.kernel.org/all/20250624130744.602c5b5f@batman.local.home/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Simplify the get_perf_callchain() user logic a bit. task_pt_regs() should never be NULL. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

If the task is not a user thread, there's no user stack to unwind. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Use the new unwind_deferred_trace() interface (if available) to defer unwinds to task context. This will allow the use of .sframe (when it becomes available) and also prevents duplicate userspace unwinds. Suggested-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

The deferred unwinder works fine for task events (events that trace only a specific task), as it can use a task_work from an interrupt or NMI and when the task goes back to user space it will call the event's callback to do the deferred unwinding. But for per CPU events things are not so simple. When a per CPU event wants a deferred unwinding to occur, it cannot simply use a task_work as there's a many to many relationship. If the task migrates and another task is scheduled in where the per CPU event wants a deferred unwinding to occur on that task as well, and the task that migrated to another CPU has that CPU's event want to unwind it too, each CPU may need unwinding from more than one task, and each task may have requests from many CPUs. To solve this, when a per CPU event is created that has defer_callchain attribute set, it will do a lookup from a global list (unwind_deferred_list), for a perf_unwind_deferred descriptor that has the id that matches the PID of the current task's group_leader. If it is not found, then it will create one and add it to the global list. This descriptor contains an array of all possible CPUs, where each element is a perf_unwind_cpu descriptor. The perf_unwind_cpu descriptor has a list of all the per CPU events that is tracing the matching CPU that corresponds to its index in the array, where the events belong to a task that has the same group_leader. It also has a processing bit and rcuwait to handle removal. For each occupied perf_unwind_cpu descriptor in the array, the perf_deferred_unwind descriptor increments its nr_cpu_events. When a perf_unwind_cpu descriptor is empty, the nr_cpu_events is decremented. This is used to know when to free the perf_deferred_unwind descriptor, as when it becomes empty, it is no longer referenced. Finally, the perf_deferred_unwind descriptor has an id that holds the PID of the group_leader for the tasks that the events were created by. When a second (or more) per CPU event is created where the perf_deferred_unwind descriptor is already created, it just adds itself to the perf_unwind_cpu array of that descriptor. Updating the necessary counter. This is used to map different per CPU events to each other based on their group leader PID. Each of these perf_deferred_unwind descriptors have a unwind_work that registers with the deferred unwind infrastructure via unwind_deferred_init(), where it also registers a callback to perf_event_deferred_cpu(). Now when a per CPU event requests a deferred unwinding, it calls unwind_deferred_request() with the associated perf_deferred_unwind descriptor. It is expected that the program that uses this has events on all CPUs, as the deferred trace may not be called on the CPU event that requested it. That is, the task may migrate and its user stack trace will be recorded on the CPU event of the CPU that it exits back to user space on. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Add a new event type for deferred callchains and a new callback for the struct perf_tool. For now it doesn't actually handle the deferred callchains but it just marks the sample if it has the PERF_CONTEXT_ USER_DEFFERED in the callchain array. At least, perf report can dump the raw data with this change. Actually this requires the next commit to enable attr.defer_callchain, but if you already have a data file, it'll show the following result. $ perf report -D ... 0x5fe0@perf.data [0x40]: event: 22 . . ... raw event: size 64 bytes . 0000: 16 00 00 00 02 00 40 00 02 00 00 00 00 00 00 00 ......@......... . 0010: 00 fe ff ff ff ff ff ff 4b d3 3f 25 45 7f 00 00 ........K.?%E... . 0020: 21 03 00 00 21 03 00 00 43 02 12 ab 05 00 00 00 !...!...C....... . 0030: 00 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 ................ 0 24344920643 0x5fe0 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 801/801: 0 ... FP chain: nr:2 ..... 0: fffffffffffffe00 ..... 1: 00007f45253fd34b : unhandled! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

kernel-patches-daemon-bpf-rc · 2025-07-02T21:09:49Z

Upstream branch: 38d95be
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=977859
version: 12

And add the missing feature detection logic to clear the flag on old kernels. $ perf record -g -vv true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CALLCHAIN|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 defer_callchain 1 ------------------------------------------------------------ sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off deferred callchain support Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Handle the deferred callchains in the script output. $ perf script perf 801 [000] 18.031793: 1 cycles:P: ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms]) ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms]) ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms]) ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms]) ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms]) ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms]) ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms]) ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms]) ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms]) ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms]) ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms]) ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms]) ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms]) perf 801 [000] 18.031814: DEFERRED CALLCHAIN 7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Save samples with deferred callchains in a separate list and deliver them after merging the user callchains. If users don't want to merge they can set tool->merge_deferred_callchains to false to prevent the behavior. With previous result, now perf script will show the merged callchains. $ perf script perf 801 [000] 18.031793: 1 cycles:P: ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms]) ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms]) ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms]) ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms]) ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms]) ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms]) ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms]) ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms]) ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms]) ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms]) ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms]) ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms]) ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms]) 7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6) ... The old output can be get using --no-merge-callchain option. Also perf report can get the user callchain entry at the end. $ perf report --no-children --percent-limit=0 --stdio -q -S __intel_pmu_enable_all.isra.0 # symbol: __intel_pmu_enable_all.isra.0 0.00% perf [kernel.kallsyms] | ---__intel_pmu_enable_all.isra.0 perf_ctx_enable event_function remote_function generic_exec_single smp_call_function_single event_function_call perf_event_for_each_child _perf_ioctl perf_ioctl __x64_sys_ioctl do_syscall_64 entry_SYSCALL_64 __GI___ioctl Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

kernel-patches-daemon-bpf-rc bot added new bpf-next V12 V12-ci-pass labels Jul 1, 2025

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from 5d84008 to 5b64e6f Compare July 1, 2025 19:24

kernel-patches-daemon-bpf-rc bot added V12-ci-fail and removed V12-ci-pass labels Jul 1, 2025

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from 8c57c04 to b70eeea Compare July 1, 2025 19:37

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from 5b64e6f to ceaf866 Compare July 1, 2025 19:39

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from b70eeea to 21b4b7a Compare July 1, 2025 19:42

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from ceaf866 to 03b7312 Compare July 1, 2025 19:44

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from 21b4b7a to 7bebef6 Compare July 1, 2025 19:47

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from 03b7312 to aaad424 Compare July 1, 2025 19:49

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from 7bebef6 to 098b57c Compare July 1, 2025 20:33

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from aaad424 to ae276cd Compare July 1, 2025 20:34

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from 098b57c to 04b85e4 Compare July 1, 2025 20:53

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from ae276cd to 0d41d28 Compare July 1, 2025 20:54

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from 04b85e4 to e254fff Compare July 2, 2025 14:59

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from 0d41d28 to 10b9fe2 Compare July 2, 2025 15:01

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from e254fff to 6421a08 Compare July 2, 2025 15:45

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from 10b9fe2 to 19d6076 Compare July 2, 2025 15:47

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from 6421a08 to d84ad1e Compare July 2, 2025 18:46

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from 19d6076 to e335247 Compare July 2, 2025 18:47

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch from d84ad1e to 7be6aa3 Compare July 2, 2025 21:07

jpoimboe and others added 8 commits July 2, 2025 14:09

perf: Skip user unwind if the task is a kernel thread

d11910a

If the task is not a user thread, there's no user stack to unwind. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

namhyung added 3 commits July 2, 2025 14:09

kernel-patches-daemon-bpf-rc bot force-pushed the series/970610=>bpf-next branch from e335247 to 3f594cf Compare July 2, 2025 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Support the deferred unwinding infrastructure #5545

perf: Support the deferred unwinding infrastructure #5545

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

Uh oh!

perf: Support the deferred unwinding infrastructure #5545

Are you sure you want to change the base?

perf: Support the deferred unwinding infrastructure #5545

Conversation

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 1, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jul 2, 2025

Uh oh!

Uh oh!