Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile: exclude CPU idle stacks by default #2166

Merged
merged 1 commit into from
Jan 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions man/man8/profile.8
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ Show stacks from user space only (no kernel space stacks).
\-K
Show stacks from kernel space only (no user space stacks).
.TP
\-I
Include CPU idle stacks (by default these are excluded).
.TP
\-\-stack-storage-size COUNT
The maximum number of unique stack traces that the kernel will count (default
16384). If the sampled count exceeds this, a warning will be printed.
Expand Down
14 changes: 14 additions & 0 deletions tools/profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
# counting there. Only the unique stacks and counts are passed to user space
# at the end of the profile, greatly reducing the kernel<->user transfer.
#
# By default CPU idle stacks are excluded by simply excluding PID 0.
#
# REQUIRES: Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). Under tools/old is
# a version of this tool that may work on Linux 4.6 - 4.8.
#
Expand All @@ -22,6 +24,7 @@
#
# 15-Jul-2016 Brendan Gregg Created this.
# 20-Oct-2016 " " Switched to use the new 4.9 support.
# 26-Jan-2019 " " Changed to exclude CPU idle by default.

from __future__ import print_function
from bcc import BPF, PerfType, PerfSWConfig
Expand Down Expand Up @@ -93,6 +96,8 @@ def stack_id_err(stack_id):
help="insert delimiter between kernel/user stacks")
parser.add_argument("-a", "--annotations", action="store_true",
help="add _[k] annotations to kernel frames")
parser.add_argument("-I", "--include-idle", action="store_true",
help="include CPU idle stacks")
parser.add_argument("-f", "--folded", action="store_true",
help="output folded format, one line per stack (for flame graphs)")
parser.add_argument("--stack-storage-size", default=16384,
Expand Down Expand Up @@ -141,6 +146,9 @@ def stack_id_err(stack_id):

int do_perf_event(struct bpf_perf_event_data *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
if (IDLE_FILTER)
return 0;

if (!(THREAD_FILTER))
return 0;

Expand Down Expand Up @@ -184,6 +192,12 @@ def stack_id_err(stack_id):
}
"""

# set idle filter
idle_filter = "pid == 0"
if args.include_idle:
idle_filter = "0"
bpf_text = bpf_text.replace('IDLE_FILTER', idle_filter)

# set thread filter
thread_context = ""
perf_filter = "-a"
Expand Down
91 changes: 30 additions & 61 deletions tools/profile_example.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,27 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- func_ab (13549)
5

The output was long; I truncated some lines ("[...]").

This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.

The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).

The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.


By default, CPU idle stacks are excluded. They can be included with -I:

# ./profile -I

[...]

native_safe_halt
Expand All @@ -64,32 +85,16 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- swapper/1 (0)
75

The output was long; I truncated some lines ("[...]").

This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.

The output above shows the most frequent stack was from the "swapper/1"
process (PID 0), running the native_safe_halt() function, which was called
by default_idle(), which was called by arch_cpu_idle(), and so on. This is
the idle thread. Stacks can be read top-down, to follow ancestry: child,
parent, grandparent, etc.

The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).

The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.


The dd process is actually "dd if=/dev/zero of=/dev/null": it's a simple
workload to analyze that just moves bytes from /dev/zero to /dev/null.
Profiling just that process:
The dd process profiled ealrier is actually "dd if=/dev/zero of=/dev/null":
it's a simple workload to analyze that just moves bytes from /dev/zero to
/dev/null. Profiling just that process:

# ./profile -p 25036
Sampling at 49 Hertz of PID 25036 by user + kernel stack... Hit Ctrl-C to end.
Expand Down Expand Up @@ -539,6 +544,8 @@ You can increase or decrease the sample frequency. Eg, sampling at 9 Hertz:
# ./profile -F 9
Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C
[...]

func_b
main
__libc_start_main
Expand All @@ -548,27 +555,6 @@ Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.

[...]

native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
start_secondary
- swapper/3 (0)
8

native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
- swapper/0 (0)
8


You can also restrict profiling to just kernel stacks (-K) or user stacks (-U).
For example, just user stacks:
Expand Down Expand Up @@ -707,24 +693,6 @@ Sampling at 49 Hertz of all threads by user stack... Hit Ctrl-C to end.
- dd (2931)
14

- swapper/7 (0)
46

- swapper/0 (0)
46

- swapper/2 (0)
46

- swapper/1 (0)
46

- swapper/3 (0)
46

- swapper/4 (0)
46


If there are too many unique stack traces for the kernel to save, a warning
will be printed. Eg:
Expand All @@ -739,8 +707,8 @@ Run ./profile -h to see the default.
USAGE message:

# ./profile -h
usage: profile [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-f] [--stack-storage-size STACK_STORAGE_SIZE]
usage: profile.py [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-I] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [-C CPU]
[duration]

Profile CPU stack traces at a timed interval
Expand All @@ -763,11 +731,12 @@ optional arguments:
sample period, number of events
-d, --delimited insert delimiter between kernel/user stacks
-a, --annotations add _[k] annotations to kernel frames
-I, --include-idle include CPU idle stacks
-f, --folded output folded format, one line per stack (for flame
graphs)
--stack-storage-size STACK_STORAGE_SIZE
the number of unique stack traces that can be stored
and displayed (default 2048)
and displayed (default 16384)
-C CPU, --cpu CPU cpu number to run profile on

examples:
Expand Down