Skip to content

Commit

Permalink
Instrument GC with memory profiler implementation
Browse files Browse the repository at this point in the history
This adds C support for a memory profiler within the GC, tracking
locations of allocations, deallocations, etc...  It operates in a
similar manner as the time profiler with single large buffers setup
beforehand through an initialization function, reducing the need for
expensive allocations while the program being measured is running.

The memory profiler instruments the GC in all locations that the GC
statistics themselves are being modified (e.g. `gc_num.allocd` and
`gc_num.freed`) by introducing new helper functions
`jl_gc_count_{allocd,freed,reallocd}()`.  Those utility functions call
the `jl_memprofile_track_{de,}alloc()` method to register an address,
a size and a tag with the memory profiler.  We also track type
information as this can be critically helpful when debugging, and to do
so without breaking API guarantees we insert methods to set the type of
a chunk of memory after allocating it where necessary.

The tagging system allows the memory profiler to disambiguate, at
profile time, between e.g. pooled allocations and the "big" allocator.
It also allows the memory allocator to support tracking multiple "memory
domains", e.g. a GPU support package could manually call
`jl_memprofile_track_alloc()` any time a chunk of memory is allocated on
the GPU so as to use the same system.  By default, all values are
tracked, however one can set a `memprof_tag_filter` value to track only
the values you are most interested in.  (E.g. only CPU domain big
allocations)

To disambiguate the memory and time profilers, we split them out into
separate modules.
  • Loading branch information
staticfloat authored and maleadt committed Oct 4, 2019
1 parent 588795e commit ec85f1d
Show file tree
Hide file tree
Showing 21 changed files with 1,875 additions and 994 deletions.
32 changes: 16 additions & 16 deletions doc/src/manual/profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ To see the profiling results, there is a [graphical browser](https://github.com/
available, but here we'll use the text-based display that comes with the standard library:

```julia-repl
julia> Profile.print()
julia> Profile.Time.print()
80 ./event.jl:73; (::Base.REPL.##1#2{Base.REPL.REPLBackend})()
80 ./REPL.jl:97; macro expansion
80 ./REPL.jl:66; eval_user_input(::Any, ::Base.REPL.REPLBackend)
Expand Down Expand Up @@ -130,7 +130,7 @@ collecting more samples:
```julia-repl
julia> @profile (for i = 1:100; myfunc(); end)
julia> Profile.print()
julia> Profile.Time.print()
[....]
3821 ./REPL[1]:2; myfunc()
3511 ./random.jl:431; rand!(::MersenneTwister, ::Array{Float64,3}, ::Int64, ::Type...
Expand All @@ -152,7 +152,7 @@ This illustrates the default "tree" dump; an alternative is the "flat" dump, whi
counts independent of their nesting:

```julia-repl
julia> Profile.print(format=:flat)
julia> Profile.Time.print(format=:flat)
Count File Line Function
6714 ./<missing> -1 anonymous
6714 ./REPL.jl 66 eval_user_input(::Any, ::Base.REPL.REPLBackend)
Expand Down Expand Up @@ -201,12 +201,12 @@ useful way to view the results.
## Accumulation and clearing

Results from [`@profile`](@ref) accumulate in a buffer; if you run multiple pieces of code under
[`@profile`](@ref), then [`Profile.print()`](@ref) will show you the combined results. This can
be very useful, but sometimes you want to start fresh; you can do so with [`Profile.clear()`](@ref).
[`@profile`](@ref), then [`Profile.Time.print()`](@ref) will show you the combined results. This can
be very useful, but sometimes you want to start fresh; you can do so with [`Profile.Time.clear()`](@ref).

## Options for controlling the display of profile results

[`Profile.print`](@ref) has more options than we've described so far. Let's see the full declaration:
[`Profile.Time.print`](@ref) has more options than we've described so far. Let's see the full declaration:

```julia
function print(io::IO = stdout, data = fetch(); kwargs...)
Expand All @@ -216,15 +216,15 @@ Let's first discuss the two positional arguments, and later the keyword argument

* `io` -- Allows you to save the results to a buffer, e.g. a file, but the default is to print to `stdout`
(the console).
* `data` -- Contains the data you want to analyze; by default that is obtained from [`Profile.fetch()`](@ref),
* `data` -- Contains the data you want to analyze; by default that is obtained from [`Profile.Time.fetch()`](@ref),
which pulls out the backtraces from a pre-allocated buffer. For example, if you want to profile
the profiler, you could say:

```julia
data = copy(Profile.fetch())
Profile.clear()
@profile Profile.print(stdout, data) # Prints the previous results
Profile.print() # Prints results from Profile.print()
data = copy(Profile.Time.fetch())
Profile.Time.clear()
@profile Profile.Time.print(stdout, data) # Prints the previous results
Profile.Time.print() # Prints results from Profile.Time.print()
```

The keyword arguments can be any combination of:
Expand All @@ -233,7 +233,7 @@ The keyword arguments can be any combination of:
with (default, `:tree`) or without (`:flat`) indentation indicating tree
structure.
* `C` -- If `true`, backtraces from C and Fortran code are shown (normally they are excluded). Try running the introductory
example with `Profile.print(C = true)`. This can be extremely helpful in deciding whether it's
example with `Profile.Time.print(C = true)`. This can be extremely helpful in deciding whether it's
Julia code or C code that is causing a bottleneck; setting `C = true` also improves the interpretability
of the nesting, at the cost of longer profile dumps.
* `combine` -- Some lines of code contain multiple operations; for example, `s += A[i]` contains both an array
Expand All @@ -256,13 +256,13 @@ to save to a file using a wide `displaysize` in an [`IOContext`](@ref):

```julia
open("/tmp/prof.txt", "w") do s
Profile.print(IOContext(s, :displaysize => (24, 500)))
Profile.Time.print(IOContext(s, :displaysize => (24, 500)))
end
```

## Configuration

[`@profile`](@ref) just accumulates backtraces, and the analysis happens when you call [`Profile.print()`](@ref).
[`@profile`](@ref) just accumulates backtraces, and the analysis happens when you call [`Profile.Time.print()`](@ref).
For a long-running computation, it's entirely possible that the pre-allocated buffer for storing
backtraces will be filled. If that happens, the backtraces stop but your computation continues.
As a consequence, you may miss some important profiling data (you will get a warning when that
Expand All @@ -271,8 +271,8 @@ happens).
You can obtain and configure the relevant parameters this way:

```julia
Profile.init() # returns the current settings
Profile.init(n = 10^7, delay = 0.01)
Profile.Time.init() # returns the current settings
Profile.Time.init(n = 10^7, delay = 0.01)
```

`n` is the total number of instruction pointers you can store, with a default value of `10^6`.
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ RUNTIME_C_SRCS := \
simplevector runtime_intrinsics precompile \
threading partr stackwalk gc gc-debug gc-pages gc-stacks method \
jlapi signal-handling safepoint jloptions timing subtype \
crc32c
crc32c memprofile
RUNTIME_SRCS := APInt-C runtime_ccall processor rtutils $(RUNTIME_C_SRCS)
SRCS := $(RUNTIME_SRCS)

Expand Down
12 changes: 10 additions & 2 deletions src/array.c
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ static jl_array_t *_new_array_(jl_value_t *atype, uint32_t ndims, size_t *dims,
tsz += tot;
tsz = JL_ARRAY_ALIGN(tsz, JL_SMALL_BYTE_ALIGNMENT); // align whole object
a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype);
jl_memprofile_set_typeof(a, atype);
// No allocation or safepoint allowed after this
a->flags.how = 0;
data = (char*)a + doffs;
Expand All @@ -107,12 +108,14 @@ static jl_array_t *_new_array_(jl_value_t *atype, uint32_t ndims, size_t *dims,
else {
tsz = JL_ARRAY_ALIGN(tsz, JL_CACHE_BYTE_ALIGNMENT); // align whole object
data = jl_gc_managed_malloc(tot);
jl_memprofile_set_typeof(data, atype);
// Allocate the Array **after** allocating the data
// to make sure the array is still young
a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype);
// No allocation or safepoint allowed after this
a->flags.how = 2;
jl_gc_track_malloced_array(ptls, a);
jl_memprofile_set_typeof(a, atype);
if (!isunboxed || isunion)
// need to zero out isbits union array selector bytes to ensure a valid type index
memset(data, 0, tot);
Expand Down Expand Up @@ -334,7 +337,9 @@ JL_DLLEXPORT jl_array_t *jl_ptr_to_array_1d(jl_value_t *atype, void *data,
if (own_buffer) {
a->flags.how = 2;
jl_gc_track_malloced_array(ptls, a);
jl_gc_count_allocd(nel*elsz + (elsz == 1 ? 1 : 0));
jl_gc_count_allocd(a, nel*elsz + (elsz == 1 ? 1 : 0),
JL_MEMPROF_TAG_DOMAIN_CPU | JL_MEMPROF_TAG_ALLOC_STDALLOC);
jl_memprofile_set_typeof(a, atype);
}
else {
a->flags.how = 0;
Expand Down Expand Up @@ -401,7 +406,9 @@ JL_DLLEXPORT jl_array_t *jl_ptr_to_array(jl_value_t *atype, void *data,
if (own_buffer) {
a->flags.how = 2;
jl_gc_track_malloced_array(ptls, a);
jl_gc_count_allocd(nel*elsz + (elsz == 1 ? 1 : 0));
jl_gc_count_allocd(a, nel*elsz + (elsz == 1 ? 1 : 0),
JL_MEMPROF_TAG_DOMAIN_CPU | JL_MEMPROF_TAG_ALLOC_STDALLOC);
jl_memprofile_set_typeof(a, atype);
}
else {
a->flags.how = 0;
Expand Down Expand Up @@ -669,6 +676,7 @@ static int NOINLINE array_resize_buffer(jl_array_t *a, size_t newlen)
a->flags.how = 1;
jl_gc_wb_buf(a, a->data, nbytes);
}
jl_memprofile_set_typeof(a->data, jl_typeof(a));
}
if (JL_ARRAY_IMPL_NUL && elsz == 1 && !isbitsunion)
memset((char*)a->data + oldnbytes - 1, 0, nbytes - oldnbytes + 1);
Expand Down
Loading

0 comments on commit ec85f1d

Please sign in to comment.