Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Multiple heap profile calls can race when setting sampling frequency #19841

Closed
1 task done
SrivastavaAnubhav opened this issue Nov 3, 2023 · 0 comments
Closed
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/low Low priority

Comments

@SrivastavaAnubhav
Copy link
Contributor

SrivastavaAnubhav commented Nov 3, 2023

Jira Link: DB-8778

Description

Since commit 9554847, with Google TCMalloc enabled, the /pprof/heap UI endpoint records an allocation profile for the specified number of seconds at a specified sample frequency.

The sample frequency is reset to the value at the start of the profile once the profile is complete. This can be problematic if two profiles are run concurrently. For example,

  1. Sampling frequency starts at 1 MB
  2. Go to pprof/heap?seconds=10&sample_frequency_bytes=1000 and start profile 1 (sets sampling freq to 1000).
  3. Go to pprof/heap?seconds=1&sample_frequency_bytes=1000 which starts profile 2 (NB: some URL parameter must be different, otherwise it looks like we have some caching that serializes the calls).
  4. Call 1 finishes and resets sampling frequency to 1 MB.
  5. Call 2 finishes and resets sampling frequency to 1000.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@SrivastavaAnubhav SrivastavaAnubhav added area/docdb YugabyteDB core features priority/low Low priority labels Nov 3, 2023
@SrivastavaAnubhav SrivastavaAnubhav self-assigned this Nov 3, 2023
@yugabyte-ci yugabyte-ci added the kind/bug This issue is a bug label Nov 3, 2023
davixiao pushed a commit to davixiao/yugabyte-db that referenced this issue Nov 27, 2023
Summary:
Since commit 9554847, with Google TCMalloc enabled, the /pprof/heap UI endpoint records an allocation profile for the specified number of seconds at a specified sample frequency.

The sample frequency is reset to the value at the start of the profile once the profile is complete. This can be problematic if two profiles are run concurrently. For example,

1. Sampling frequency starts at 1 MB
2. Go to `pprof/heap?seconds=10&sample_frequency_bytes=1000` and start profile 1 (sets sampling freq to 1000).
3. Go to `pprof/heap?seconds=1&sample_frequency_bytes=1000` which starts profile 2 (NB: some URL parameter must be different, otherwise it looks like we have some caching that serializes the calls).
4. Call 1 finishes and resets sampling frequency to 1 MB.
5. Call 2 finishes and resets sampling frequency to 1000.

This diff prevents this by only allowing one heap profile to run at a time.
This diff also removes existing usage of the phrase "allocation profile" in favor of "heap profile", which is used everywhere else in our code / documentation.
Jira: DB-8778

Test Plan: `ybd --cxx-test server_pprof-path-handler_util-test --gtest_filter SamplingProfilerTest.OnlyOneHeapProfile`. Also tested manually by testing the steps listed above and verifying that we get the error message.

Reviewers: mlillibridge

Reviewed By: mlillibridge

Subscribers: ybase, esheng, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29963
SrivastavaAnubhav added a commit that referenced this issue Nov 28, 2023
Summary:
Original commit: 03e91df / D29963
Since commit 9554847, with Google TCMalloc enabled, the /pprof/heap UI endpoint records an allocation profile for the specified number of seconds at a specified sample frequency.

The sample frequency is reset to the value at the start of the profile once the profile is complete. This can be problematic if two profiles are run concurrently. For example,

1. Sampling frequency starts at 1 MB
2. Go to `pprof/heap?seconds=10&sample_frequency_bytes=1000` and start profile 1 (sets sampling freq to 1000).
3. Go to `pprof/heap?seconds=1&sample_frequency_bytes=1000` which starts profile 2 (NB: some URL parameter must be different, otherwise it looks like we have some caching that serializes the calls).
4. Call 1 finishes and resets sampling frequency to 1 MB.
5. Call 2 finishes and resets sampling frequency to 1000.

This diff prevents this by only allowing one heap profile to run at a time.
This diff also removes existing usage of the phrase "allocation profile" in favor of "heap profile", which is used everywhere else in our code / documentation.
Jira: DB-8778

Test Plan: `ybd --cxx-test server_pprof-path-handler_util-test --gtest_filter SamplingProfilerTest.OnlyOneHeapProfile`. Also tested manually by testing the steps listed above and verifying that we get the error message.

Reviewers: mlillibridge

Reviewed By: mlillibridge

Subscribers: ybase, esheng, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30502
SrivastavaAnubhav added a commit that referenced this issue Nov 28, 2023
Summary:
Original commit: 03e91df / D29963
Since commit 9554847, with Google TCMalloc enabled, the /pprof/heap UI endpoint records an allocation profile for the specified number of seconds at a specified sample frequency.

The sample frequency is reset to the value at the start of the profile once the profile is complete. This can be problematic if two profiles are run concurrently. For example,

1. Sampling frequency starts at 1 MB
2. Go to `pprof/heap?seconds=10&sample_frequency_bytes=1000` and start profile 1 (sets sampling freq to 1000).
3. Go to `pprof/heap?seconds=1&sample_frequency_bytes=1000` which starts profile 2 (NB: some URL parameter must be different, otherwise it looks like we have some caching that serializes the calls).
4. Call 1 finishes and resets sampling frequency to 1 MB.
5. Call 2 finishes and resets sampling frequency to 1000.

This diff prevents this by only allowing one heap profile to run at a time.
This diff also removes existing usage of the phrase "allocation profile" in favor of "heap profile", which is used everywhere else in our code / documentation.
Jira: DB-8778

Test Plan: `ybd --cxx-test server_pprof-path-handler_util-test --gtest_filter SamplingProfilerTest.OnlyOneHeapProfile`. Also tested manually by testing the steps listed above and verifying that we get the error message.

Reviewers: mlillibridge

Reviewed By: mlillibridge

Subscribers: bogdan, esheng, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30501
jharveysmith pushed a commit that referenced this issue May 24, 2024
Summary:
Original commit: ee4624bb9964dc4202df40e3780e7439a73f6424 / D29963
Since commit 53117be, with Google TCMalloc enabled, the /pprof/heap UI endpoint records an allocation profile for the specified number of seconds at a specified sample frequency.

The sample frequency is reset to the value at the start of the profile once the profile is complete. This can be problematic if two profiles are run concurrently. For example,

1. Sampling frequency starts at 1 MB
2. Go to `pprof/heap?seconds=10&sample_frequency_bytes=1000` and start profile 1 (sets sampling freq to 1000).
3. Go to `pprof/heap?seconds=1&sample_frequency_bytes=1000` which starts profile 2 (NB: some URL parameter must be different, otherwise it looks like we have some caching that serializes the calls).
4. Call 1 finishes and resets sampling frequency to 1 MB.
5. Call 2 finishes and resets sampling frequency to 1000.

This diff prevents this by only allowing one heap profile to run at a time.
This diff also removes existing usage of the phrase "allocation profile" in favor of "heap profile", which is used everywhere else in our code / documentation.
Jira: DB-8778

Test Plan: `ybd --cxx-test server_pprof-path-handler_util-test --gtest_filter SamplingProfilerTest.OnlyOneHeapProfile`. Also tested manually by testing the steps listed above and verifying that we get the error message.

Reviewers: mlillibridge

Reviewed By: mlillibridge

Subscribers: bogdan, esheng, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30501
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/low Low priority
Projects
None yet
Development

No branches or pull requests

2 participants