Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Add /pprof/heap support for google tcmalloc's sampling profiler #15990

Closed
SrivastavaAnubhav opened this issue Feb 6, 2023 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@SrivastavaAnubhav
Copy link
Contributor

SrivastavaAnubhav commented Feb 6, 2023

Jira Link: DB-5395

Description

The new tcmalloc (--use-google-tcmalloc) uses a sampling-based profiler (and got rid of the old all-allocation profiler). We should integrate this new profiler with the /pprof/heap endpoint so we can track memory allocations.

This is a blocker for turning on the new tcmalloc by default (#13701).

@SrivastavaAnubhav SrivastavaAnubhav added area/docdb YugabyteDB core features priority/high High Priority labels Feb 6, 2023
@SrivastavaAnubhav SrivastavaAnubhav self-assigned this Feb 6, 2023
@yugabyte-ci yugabyte-ci added the kind/bug This issue is a bug label Feb 6, 2023
@SrivastavaAnubhav SrivastavaAnubhav added kind/new-feature This is a request for a completely new feature and removed kind/bug This issue is a bug labels Feb 6, 2023
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed kind/new-feature This is a request for a completely new feature priority/high High Priority labels Feb 17, 2023
SrivastavaAnubhav added a commit that referenced this issue Mar 13, 2023
Summary:
The new tcmalloc (--use-google-tcmalloc) uses a sampling-based profiler (and got rid of the old all-allocation profiler).

This diff updates /pprof/heap to use the sampling profiler. The output of this endpoint was previously parsed by yb-prof, but since the new format is structured, it's easy enough to output directly as a table (and makes testability and symbolization easier).

There are 3 arguments that can passed to passed to /pprof/heap as url arguments:
1. `seconds` controls how long to profile for
2. `sample_freq_bytes` controls how often we sample for allocations.
3. `only_growth` controls whether we output only call stacks for allocations for which we did not observe a corresponding deallocation. Pass only growth if you want the equivalent of `yb-prof`'s in_use_bytes.html, and do not pass it if you want the equivalent of `alloc_bytes.html`. NB: Both alloc_bytes.html and in_use_bytes were essentially the same data, just sorted differently (i.e. both had the allocated bytes and in use bytes for each call stack). In the new endpoints, I kept the information separate (if we want this back we can change it though).

For example, if you want to profile allocations for 60s, sampling every 2MiB, you would go to (for a tserver):
```
IP:9000/pprof/heap?seconds=60&sample_freq_bytes=2000000
```
If you wanted the same as above but with only allocations that were not deallocated you would go to:
```
IP:9000/pprof/heap?seconds=60&sample_freq_bytes=2000000&only_growth=true
```

Example output for the /pprof/heap endpoint:
{F34511}

{F34512}

This diff also adds a /pprof/heap_snapshot endpoint that gives an instantaneous view of the heap at either the time of peak heap usage (if the `peak_heap` url argument is true) or at the current time (if `peak_heap` is false). For this endpoint to be used, the `enable_process_lifetime_heap_profiling` gflag must have been set to true since the last restart (samples are taken according to the frequency in the new `profiler_sample_freq_bytes` gflag).

Test Plan: Manually tested by running `CassandraBatchTimeseries` against a cluster and verifying that for `only_growth = True` we only see stacks that were not deallocated (rocksdb inserts), and with `only_growth = False` we see all stacks.

Reviewers: amitanand, bogdan, esheng

Reviewed By: esheng

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D22780
SrivastavaAnubhav added a commit that referenced this issue Mar 13, 2023
Summary: Guards the new tcmalloc malloc_extension.h introduced by D22780 with YB_GOOGLE_TCMALLOC.

Test Plan: jenkins:urgent

Reviewers: skedia

Reviewed By: skedia

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D23518
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants