Skip to content

Commit

Permalink
import git://github.com/brendangregg/FlameGraph @ 250bd356837aa62188b…
Browse files Browse the repository at this point in the history
…3205918b6b05c04d54c93
  • Loading branch information
tmm1 committed Jan 24, 2014
1 parent 453bb7f commit 9f7fd24
Show file tree
Hide file tree
Showing 2 changed files with 628 additions and 0 deletions.
134 changes: 134 additions & 0 deletions vendor/FlameGraph/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
Flame Graphs visualize profiled code-paths.

Website: http://www.brendangregg.com/flamegraphs.html

CPU profiling using DTrace, perf_events, SystemTap, or ktap: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
CPU profiling using XCode Instruments: http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/
CPU profiling using Xperf.exe: http://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/
Memory profiling: http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html

These can be created in three steps:

1. Capture stacks
2. Fold stacks
3. flamegraph.pl


1. Capture stacks
=================
Stack samples can be captured using DTrace, perf_events or SystemTap.

Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz:

# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks

Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz:

# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks

Using DTrace to capture 60 seconds of user-level stacks, including while time is spent in the kernel, for PID 12345 at 97 Hertz:

# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks

Switch ustack() for jstack() if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/). The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using jstack() as it performs additional work to translate frames.

2. Fold stacks
==============
Use the stackcollapse programs to fold stack samples into single lines. The programs provided are:

- stackcollapse.pl: for DTrace stacks
- stackcollapse-perf.pl: for perf_events "perf script" output
- stackcollapse-stap.pl: for SystemTap stacks
- stackcollapse-instruments.pl: for XCode Instruments

Usage example:

$ ./stackcollapse.pl out.kern_stacks > out.kern_folded

The output looks like this:

unix`_sys_sysenter_post_swapgs 1401
unix`_sys_sysenter_post_swapgs;genunix`close 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf 85
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_closef 26
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_setf 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_getstate 6
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_unfalloc 2
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`closef 48
[...]

3. flamegraph.pl
================
Use flamegraph.pl to render a SVG.

$ ./flamegraph.pl out.kern_folded > kernel.svg

An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg:

$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg


Provided Example
================
An example output from DTrace is included, both the captured stacks and
the resulting Flame Graph. You can generate it yourself using:

$ ./stackcollapse.pl example-stacks.txt | ./flamegraph.pl > example.svg

This was from a particular performance investigation: the Flame Graph
identified that CPU time was spent in the lofs module, and quantified
that time.


Options
=======
See the USAGE message (--help) for options:

USAGE: ./flamegraph.pl [options] infile > outfile.svg

--titletext # change title text
--width # width of image (default 1200)
--height # height of each frame (default 16)
--minwidth # omit smaller functions (default 0.1 pixels)
--fonttype # font type (default "Verdana")
--fontsize # font size (default 12)
--countname # count type label (default "samples")
--nametype # name type label (default "Function:")
--colors # "hot", "mem", "io" palette (default "hot")
--hash # colors are keyed by function name hash
--cp # use consistent palette (palette.map)
eg,
./flamegraph.pl --titletext="Flame Graph: malloc()" trace.txt > graph.svg

As suggested in the example, flame graphs can process traces of any event,
such as malloc()s, provided stack traces are gathered.


Consistent Palette
==================
If you use the --cp option, it will use the $colors selection and randomly
generate the palette like normal. Any future flamegraphs created using the --cp
option will use the same palette map. Any new symbols from future flamegraphs
will have their colors randomly generated using the $colors selection.

If you don't like the palette, just delete the palette.map file.

This allows your to change your colorscheme between flamegraphs to make the
differences REALLY stand out.

Example:

Say we have 2 captures, one with a problem, and one when it was working
(whatever "it" is):

cat working.folded | ./flamegraph.pl --cp > working.svg
# this generates a palette.map, as per the normal random generated look.

cat broken.folded | ./flamegraph.pl --cp --colors mem > broken.svg
# this svg will use the same palette.map for the same events, but a very
# different colorscheme for any new events.

Take a look at the demo directory for an example:

palette-example-working.svg
palette-example-broken.svg
Loading

0 comments on commit 9f7fd24

Please sign in to comment.