This package contains profiling tools for the Julia language. Profilers are mainly used for code optimization, particularly to find bottlenecks.
This package implements two types of profilers, an "instrumenting" profiler and a "sampling" (statistical) profiler.
The instrumenting profiler provides detailed information, including the number of times each line gets executed and the total run time spent on that line. However, the instrumenting profiler requires that your code be (automatically) modified, by encapsulating it in "@iprofile begin...end". It also has a significant performance cost.
The sampling profiler works by grabbing "snapshots" during the execution of any task. Each "snapshot" extracts the currently-running function and line number. When a line occurs frequently in the set of snapshots, one might suspect that this line consumes significant resources.
The weakness of a sampling profiler is that these snapshots do not provide complete line-by-line coverage, because they occur at intervals (by default, 1 ms). However, the sampling profiler also has strengths. First, it requires no code modifications. As an important consequence, it can profile into Julia's core code and even (optionally) into C libraries. Second, by running "infrequently" there is very little performance overhead.
Within Julia, use the package manager:
Pkg.add("Profile")
Unless you want to see Julia's compiler in action, it's a good idea to first run the code you intend to profile at least once. Here's a demo:
require("Profile")
using SProfile # use the sampling profiler
function myfunc()
A = rand(100, 100, 200)
sum(A)
end
myfunc() # run once to force compilation
@sprofile myfunc()
To get the results, we do this:
julia> sprofile_tree(true)
14 julia; ???; line: 0
14 /lib/x86_64-linux-gnu/libc.so.6; __libc_start_main; line: 237
14 julia; ???; line: 0
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; julia_trampoline; line: 131
14 julia; ???; line: 0
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
14 ???; ???; line: 0
14 client.jl; _start; line: 344
14 client.jl; run_repl; line: 141
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
14 ???; ???; line: 0
14 client.jl; eval_user_input; line: 68
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
14 /home/tim/src/julia-modules/Profile.jl/src/sprofile.jl; anonymous; line: 301
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
14 ???; ???; line: 0
11 none; myfunc; line: 2
11 librandom.jl; dsfmt_gv_fill_array_close_open!; line: 128
1 /home/tim/src/julia/usr/bin/../lib/librandom.so; dsfmt_fill_array_close_open; line: 375
9 /home/tim/src/julia/usr/bin/../lib/librandom.so; dsfmt_fill_array_close_open; line: 383
1 /home/tim/src/julia/usr/bin/../lib/librandom.so; dsfmt_fill_array_close_open; line: 391
3 none; myfunc; line: 3
3 abstractarray.jl; sum; line: 1266
The true
as an argument causes it to include C functions in the results; if
you simply call sprofile_tree()
, or sprofile_tree(false)
, it will exclude C
functions.
Here's how to interpret these results: the first "field" indicates the number of
samples taken, the second the filename followed by a semicolon, the third the
function name followed by a semicolon, and the fourth the line number (or, for C
code, the instruction offset). Long paths and function names are
truncated at their beginning, showing "...", and the display is set to fit
inside the width of your terminal (so your results may look slightly different).
Each sub-function call indents an additional space. There are some situations in
which the backtrace line cannot be not resolved to a function name, and these
lines are shown with ???
. In some
cases, you may see things like "+n" (where n is a number) prepended to one or
more lines; that's an indication that it should have indented another n spaces,
but ran out of room.
In this specific case, there were 14 "samples" taken during the running of
myfunc
. The top levels are julia
itself, the system's libc
in __libc_start_main
,
a call to julia_trampoline
, and eventually our first Julia
function, _start
, in client.jl
. Progressing a little further,
eval_user_input
is the function that gets evaluated each time you type
something on the REPL, so this is essentially the "parent" of all activity that
you initiate from the REPL. Each one of these shows 14 samples, meaning that
each snapshot captured a state "inside" these functions.
The first "interesting" part is the line
11 none; myfunc; line: 2
none
refers to the fact that we typed myfunc
in at the REPL, rather than
putting it in a file. Line 2 contains the call to rand
, and there were 11 (out
of 14) snapshots taken here. Below that, you can see a call to
dsfmt_gv_fill_array_close_open!
inside librandom.jl
and finally down again
into a C library. You might be surprised not to see the rand
function listed
explicitly: that's because rand
is inlined, and hence doesn't appear in the
backtraces.
A little further down, you see
3 none; myfunc; line: 3
Line 3 of myfunc
contains the call to sum
, and there were 3 (out of 14)
samples taken here. Below that, you can see the specific place in
base/abstractarray.jl
that implements the sum
function for these inputs.
Some lines contain multiple operations, and in fact the backtraces can
distinguish between these operations. By default, however, the counts are
"collapsed." By saying sprofile_tree(true, false)
you can see separate
counts for each operation. The second argument determines whether the multiple
operations are merged into a single line.
Overall, we can tentatively conclude that random number generation is several times as expensive as the sum operation. To get better statistics, we'd want to run this multiple times.
An alternative way of viewing the results is as a "flat" dump:
julia> sprofile_flat(true)
Count File Function Line/offset
14 /home/tim/src/julia-modules/Profile.jl/src/sprofile.jl anonymous 301
14 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so julia_trampoline 131
1 /home/tim/src/julia/usr/bin/../lib/librandom.so dsfmt_fill_array_close_open 375
9 /home/tim/src/julia/usr/bin/../lib/librandom.so dsfmt_fill_array_close_open 383
1 /home/tim/src/julia/usr/bin/../lib/librandom.so dsfmt_fill_array_close_open 391
14 /lib/x86_64-linux-gnu/libc.so.6 __libc_start_main 237
3 abstractarray.jl sum 1266
14 client.jl _start 344
14 client.jl eval_user_input 68
14 client.jl run_repl 141
11 librandom.jl dsfmt_gv_fill_array_close_open! 128
11 none myfunc 2
3 none myfunc 3
The same flags apply for sprofile_flat
as for sprofile_tree
.
You can accumulate the results of multiple calls, and view the combined results
with these functions. If you want to start from scratch, use
sprofile_clear()
.
The sampling profiler just accumulates snapshots, and the analysis only happens
when you ask for the report with sprofile_tree
or sprofile_flat
. For a
long-running computation, it's entirely possible that the pre-allocated buffer
for storing snapshots will be filled. If that happens, the snapshots stop, but
your computation continues. As a consequence, you may miss some important
profiling data, although you will get a warning when that happens.
You can adjust the behavior by calling
sprofile_init(nsamples, delay)
where both parameters are integers. delay
is expressed in nanoseconds, and is
the amount of time that Julia gets between snapshots to perform the requested
computations. (A very long-running job might not need such frequent snapshots.)
The larger nsamples
, the more snapshots you can take, at the cost of larger
memory requirements.
This starts similarly:
require("Profile")
using IProfile # use the instrumenting profiler
Here you encapsulate your code in a macro call:
@iprofile begin
function f1(x::Int)
z = 0
for j = 1:x
z += j^2
end
return z
end
function f1(x::Float64)
return x+2
end
function f1{T}(x::T)
return x+5
end
f2(x) = 2*x
end # @iprofile begin
Now load the file and execute the code you want to profile, e.g.:
f1(215)
for i = 1:100
f1(3.5)
end
for i = 1:150
f1(uint8(7))
end
for i = 1:125
f2(11)
end
To view the execution times, type @iprofile report
.
Here are the various options you have for controlling profiling:
@iprofile report
: display cumulative profiling results@iprofile clear
: clear all timings accumulated thus far (start from zero)@iprofile off
: turn profiling off (there is no need to remove @iprofile begin ... end blocks)@iprofile on
: turn profiling back on
You should always discard the results of your first run: it may include the overhead needed to JIT-compile some of the subfunctions.
The primary source of variability is the garbage collector---if it runs between two instrumentation lines, its execution time gets added to the time that your own line of code contributes. This can make a fast-running line seem puzzlingly slow. One good way to reduce the variance is to run gc() before profiling. However, if your code tends to accumulate a bunch of temporaries that need to be cleaned up in the middle of the run, then calling gc()
at the beginning can cause the collector to run at the same point in the code each time, a misleading but consistent result. A different approach is to use multiple runs (without an explicit gc()
) and hope that the collector runs at different points in each run. The cost of a given line is probably best reflected in those runs with shortest time for that line.
Instrumenting profiling adds a performance overhead which can be significant. You can prevent a subsection of your code from being profiled by encapsulating it inside a begin ... end
block; in this case, the block as a whole is profiled, but the individual lines inside the block are not separately timed.
The profiler tries to compensate for its overhead in the reported times. This naturally leads to some degree of uncertainty about the execution time of individual lines. More significantly, currently the profiler does not compensate for its own instrumentation in profiled subfunctions. Consequently, it’s recommended that you avoid profiling nested code as a chunk---you probably want to pick out individual functions or groups of functions to profile separately.
Both limitations are addressed by the sampling profiler. Consequently, it's probably best to start with the sampling profiler, and then use the instrumenting profiler when detailed attention to a particular function is warranted.