forked from tmm1/stackprof
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
import git://github.com/brendangregg/FlameGraph @ 250bd356837aa62188b…
…3205918b6b05c04d54c93
- Loading branch information
Showing
2 changed files
with
628 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
Flame Graphs visualize profiled code-paths. | ||
|
||
Website: http://www.brendangregg.com/flamegraphs.html | ||
|
||
CPU profiling using DTrace, perf_events, SystemTap, or ktap: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html | ||
CPU profiling using XCode Instruments: http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/ | ||
CPU profiling using Xperf.exe: http://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/ | ||
Memory profiling: http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html | ||
|
||
These can be created in three steps: | ||
|
||
1. Capture stacks | ||
2. Fold stacks | ||
3. flamegraph.pl | ||
|
||
|
||
1. Capture stacks | ||
================= | ||
Stack samples can be captured using DTrace, perf_events or SystemTap. | ||
|
||
Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz: | ||
|
||
# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks | ||
|
||
Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz: | ||
|
||
# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks | ||
|
||
Using DTrace to capture 60 seconds of user-level stacks, including while time is spent in the kernel, for PID 12345 at 97 Hertz: | ||
|
||
# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks | ||
|
||
Switch ustack() for jstack() if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/). The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using jstack() as it performs additional work to translate frames. | ||
|
||
2. Fold stacks | ||
============== | ||
Use the stackcollapse programs to fold stack samples into single lines. The programs provided are: | ||
|
||
- stackcollapse.pl: for DTrace stacks | ||
- stackcollapse-perf.pl: for perf_events "perf script" output | ||
- stackcollapse-stap.pl: for SystemTap stacks | ||
- stackcollapse-instruments.pl: for XCode Instruments | ||
|
||
Usage example: | ||
|
||
$ ./stackcollapse.pl out.kern_stacks > out.kern_folded | ||
|
||
The output looks like this: | ||
|
||
unix`_sys_sysenter_post_swapgs 1401 | ||
unix`_sys_sysenter_post_swapgs;genunix`close 5 | ||
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf 85 | ||
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_closef 26 | ||
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_setf 5 | ||
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_getstate 6 | ||
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_unfalloc 2 | ||
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`closef 48 | ||
[...] | ||
|
||
3. flamegraph.pl | ||
================ | ||
Use flamegraph.pl to render a SVG. | ||
|
||
$ ./flamegraph.pl out.kern_folded > kernel.svg | ||
|
||
An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg: | ||
|
||
$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg | ||
|
||
|
||
Provided Example | ||
================ | ||
An example output from DTrace is included, both the captured stacks and | ||
the resulting Flame Graph. You can generate it yourself using: | ||
|
||
$ ./stackcollapse.pl example-stacks.txt | ./flamegraph.pl > example.svg | ||
|
||
This was from a particular performance investigation: the Flame Graph | ||
identified that CPU time was spent in the lofs module, and quantified | ||
that time. | ||
|
||
|
||
Options | ||
======= | ||
See the USAGE message (--help) for options: | ||
|
||
USAGE: ./flamegraph.pl [options] infile > outfile.svg | ||
|
||
--titletext # change title text | ||
--width # width of image (default 1200) | ||
--height # height of each frame (default 16) | ||
--minwidth # omit smaller functions (default 0.1 pixels) | ||
--fonttype # font type (default "Verdana") | ||
--fontsize # font size (default 12) | ||
--countname # count type label (default "samples") | ||
--nametype # name type label (default "Function:") | ||
--colors # "hot", "mem", "io" palette (default "hot") | ||
--hash # colors are keyed by function name hash | ||
--cp # use consistent palette (palette.map) | ||
eg, | ||
./flamegraph.pl --titletext="Flame Graph: malloc()" trace.txt > graph.svg | ||
|
||
As suggested in the example, flame graphs can process traces of any event, | ||
such as malloc()s, provided stack traces are gathered. | ||
|
||
|
||
Consistent Palette | ||
================== | ||
If you use the --cp option, it will use the $colors selection and randomly | ||
generate the palette like normal. Any future flamegraphs created using the --cp | ||
option will use the same palette map. Any new symbols from future flamegraphs | ||
will have their colors randomly generated using the $colors selection. | ||
|
||
If you don't like the palette, just delete the palette.map file. | ||
|
||
This allows your to change your colorscheme between flamegraphs to make the | ||
differences REALLY stand out. | ||
|
||
Example: | ||
|
||
Say we have 2 captures, one with a problem, and one when it was working | ||
(whatever "it" is): | ||
|
||
cat working.folded | ./flamegraph.pl --cp > working.svg | ||
# this generates a palette.map, as per the normal random generated look. | ||
|
||
cat broken.folded | ./flamegraph.pl --cp --colors mem > broken.svg | ||
# this svg will use the same palette.map for the same events, but a very | ||
# different colorscheme for any new events. | ||
|
||
Take a look at the demo directory for an example: | ||
|
||
palette-example-working.svg | ||
palette-example-broken.svg |
Oops, something went wrong.