Skip to content

Conversation

@jsdt
Copy link
Contributor

@jsdt jsdt commented Mar 27, 2025

Description of Changes

This adds endpoints to make heap profiling easier. By default, standalone will have jemalloc heap profiling enabled, but not active, which didn't show any overhead in my benchmarks.

If you activate profiling, with curl "localhost:3000/internal/heap/settings?enabled=true" -X POST, jemalloc will start sampling allocations. It will keep sampling until you call it again with enabled=true (or restart the process).

You can get a dump of the heap by GETing the internal/heap endpoint. By default, this will be in pprof format (see pprof for installing a tool to view it). If you use the format=flame query parameter, it will give a flame graph instead. You can easily view the flame graph in your browser by going to localhost:3000/internal/heap?format=flame

There is some additional overhead while profiling is activated, because we are collecting samples, and using memory to store those samples, but it should be safe to try this out.

Note that the jemalloc configurations settings are set with the _rjem_malloc_conf variable in the main file of standalone, but this can be overridden by setting the _RJEM_MALLOC_CONF environment variable. You can learn more about it here.

Extra background

Previously, it was possible to set up heap profiling by setting the environment variable _RJEM_MALLOC_CONF to something like prof:true,prof_active:true,lg_prof_sample:12,prof_prefix:/jemalloc-dumps/,lg_prof_interval:20, which would make the program sample allocations and dump a heap profile to disk periodically. Using these dumps was a little annoying with docker, because the pprof tool needs to use the original binary to give readable symbols.

To make it easier, this uses the jemalloc-pprof crate, which takes care of attaching symbols/backtraces for us. This means we can parse the output without needing access to the binary. This adds some extra overhead, but we can always remove it if we decide it is too slow.

API and ABI breaking changes

I added an /internal section of the http API (outside of v1), with the idea that this is an undocumented feature that we may change or remove at any time.

Expected complexity level and risk

  1. The main risk here is around the overhead when using it. There is very little risk of performance issues while profiling is not active.

Testing

Tested manually locally, by curling the different endpoints, and also by setting _RJEM_MALLOC_CONF to prof:false to totally disable profiling.

@bfops bfops self-requested a review March 27, 2025 20:00
@bfops
Copy link
Collaborator

bfops commented Mar 28, 2025

Just to check with @jdetter as someone who ends up doing a lot of profiling - are you happy with the pprof choice here?

@bfops
Copy link
Collaborator

bfops commented Mar 28, 2025

Tested manually locally, by curling the different endpoints, and also by setting _RJEM_MALLOC_CONF to prof:false to totally disable profiling.

to confirm, you did all of these?

  • Set _RJEM_MALLOC_CONF to prof:false to totally disable profiling
  • Called /internal/heap/settings?enabled=true and then verified that samples were being collected by calling /internal/heap
  • Called /internal/heap/settings?enabled=false and verified that samples were no longer being collected by calling /internal/heap
  • Called /internal/heap/settings and verified that it returned the current profiling enabled/disabled status

@bfops
Copy link
Collaborator

bfops commented Mar 28, 2025

I left some questions but this broadly looks good to me. I can't really speak to the details of whether or not we're using the jemalloc-specific stuff correctly, but the fallout seems low if we've gotten something wrong.

@jsdt
Copy link
Contributor Author

jsdt commented Mar 30, 2025

Yes, I have done all of those tests.

@bfops bfops added the release-any To be landed in any release window label Mar 31, 2025
Copy link
Collaborator

@bfops bfops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I can't really speak to the details of whether or not we're using the jemalloc-specific stuff correctly, but the fallout seems low if we've gotten something wrong.

@jsdt jsdt added this pull request to the merge queue Mar 31, 2025
Merged via the queue into master with commit 64aef29 Mar 31, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-any To be landed in any release window

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants