-
Notifications
You must be signed in to change notification settings - Fork 664
Add endpoints for heap profiling #2517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Just to check with @jdetter as someone who ends up doing a lot of profiling - are you happy with the pprof choice here? |
to confirm, you did all of these?
|
|
I left some questions but this broadly looks good to me. I can't really speak to the details of whether or not we're using the jemalloc-specific stuff correctly, but the fallout seems low if we've gotten something wrong. |
|
Yes, I have done all of those tests. |
bfops
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I can't really speak to the details of whether or not we're using the jemalloc-specific stuff correctly, but the fallout seems low if we've gotten something wrong.
Description of Changes
This adds endpoints to make heap profiling easier. By default, standalone will have jemalloc heap profiling enabled, but not active, which didn't show any overhead in my benchmarks.
If you activate profiling, with
curl "localhost:3000/internal/heap/settings?enabled=true" -X POST, jemalloc will start sampling allocations. It will keep sampling until you call it again withenabled=true(or restart the process).You can get a dump of the heap by
GETing theinternal/heapendpoint. By default, this will be inpprofformat (see pprof for installing a tool to view it). If you use theformat=flamequery parameter, it will give a flame graph instead. You can easily view the flame graph in your browser by going tolocalhost:3000/internal/heap?format=flameThere is some additional overhead while profiling is activated, because we are collecting samples, and using memory to store those samples, but it should be safe to try this out.
Note that the jemalloc configurations settings are set with the
_rjem_malloc_confvariable in the main file of standalone, but this can be overridden by setting the_RJEM_MALLOC_CONFenvironment variable. You can learn more about it here.Extra background
Previously, it was possible to set up heap profiling by setting the environment variable
_RJEM_MALLOC_CONFto something likeprof:true,prof_active:true,lg_prof_sample:12,prof_prefix:/jemalloc-dumps/,lg_prof_interval:20, which would make the program sample allocations and dump a heap profile to disk periodically. Using these dumps was a little annoying with docker, because thepproftool needs to use the original binary to give readable symbols.To make it easier, this uses the jemalloc-pprof crate, which takes care of attaching symbols/backtraces for us. This means we can parse the output without needing access to the binary. This adds some extra overhead, but we can always remove it if we decide it is too slow.
API and ABI breaking changes
I added an
/internalsection of the http API (outside of v1), with the idea that this is an undocumented feature that we may change or remove at any time.Expected complexity level and risk
Testing
Tested manually locally, by curling the different endpoints, and also by setting
_RJEM_MALLOC_CONFtoprof:falseto totally disable profiling.