Reduce frequency of two categories of Sev30s #12310

spraza · 2025-08-19T17:55:15Z

Description

As part of 7.4 qualification, we noticed two categories of frequent Sev30's:

Type=StorageServerStatusJson with MissingAttribute=BytesStored. This can happen if storage server sends an incomplete metrics response to cluster controller as part of status json generation. The fix here is to reduce the frequency of this trace event. Currently, it's suppressed for 5 sec.
Type=SuppressionFromNonNetworkThread with Event=TLSPolicySuccess. This is interesting because here we have a non-suppressable verbose event that gets triggered when we try to suppress another event. It's also interesting because the intent seems to be that we do not expect any suppressions from a non-fdbmain thread. The intent makes sense if the non-fdbmain threads are not doing much, because if they are, by law of probability, we'd want to suppress some events, which we normally do. The intent has lived in the codebase for multiple years, before TLS. Post-TLS, the non-fdbmain threads are doing more work, and we should allow suppressions. So the fix here is to just remove the SuppressionFromNonNetworkThread event and simplify the surrounding logic a bit.

Both these changes will be documented in 7.4 release notes so anyone relying on these are aware before they upgrade.

I will also cherry-pick this change to the main branch.

Testing

100K: 20250818-214742-praza-trace-bugs-v1-07e54a8-fd58a76001faf269 compressed=True data_size=40158474 duration=5038760 ended=100000 fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=0:57:37 sanity=False started=100000 stopped=20250818-224519 submitted=20250818-214742 timeout=5400 username=praza-trace-bugs-v1-07e54a89702c9aef8946f937030bf8a4acd38e46

The 1 failure is in ParallelRestoreOldBackupCorrectnessAtomicOp.toml and is not related to this change.

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

The PR has a description, explaining both the problem and the solution.
The description mentions which forms of testing were done and the testing seems reasonable.
Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

foundationdb-ci · 2025-08-19T18:34:24Z

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 07e54a8
Duration 0:38:58
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-08-19T18:44:29Z

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Commit ID: 07e54a8
Duration 0:49:04
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-08-19T18:53:01Z

Result of foundationdb-pr-clang on Linux RHEL 9

Commit ID: 07e54a8
Duration 0:57:37
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-08-19T18:57:37Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 07e54a8
Duration 1:02:13
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-08-19T19:02:03Z

Result of foundationdb-pr on Linux RHEL 9

Commit ID: 07e54a8
Duration 1:06:36
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-08-19T19:06:11Z

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

Commit ID: 07e54a8
Duration 1:10:45
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

Reduce frequency of two categories of Sev30s

07e54a8

spraza requested review from jzhou77 and bnamasivayam August 19, 2025 17:55

spraza mentioned this pull request Aug 19, 2025

[main branch] Reduce frequency of two categories of Sev30s #12311

Merged

5 tasks

jzhou77 approved these changes Aug 19, 2025

View reviewed changes

bnamasivayam approved these changes Aug 20, 2025

View reviewed changes

spraza merged commit 6198f1d into apple:release-7.4 Aug 20, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce frequency of two categories of Sev30s #12310

Reduce frequency of two categories of Sev30s #12310

Uh oh!

spraza commented Aug 19, 2025 •

edited

Loading

Uh oh!

foundationdb-ci commented Aug 19, 2025

Uh oh!

foundationdb-ci commented Aug 19, 2025

Uh oh!

foundationdb-ci commented Aug 19, 2025

Uh oh!

foundationdb-ci commented Aug 19, 2025

Uh oh!

foundationdb-ci commented Aug 19, 2025

Uh oh!

foundationdb-ci commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

Reduce frequency of two categories of Sev30s #12310

Reduce frequency of two categories of Sev30s #12310

Uh oh!

Conversation

spraza commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Code-Reviewer Section

For Release-Branches

Uh oh!

foundationdb-ci commented Aug 19, 2025

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Uh oh!

foundationdb-ci commented Aug 19, 2025

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Uh oh!

foundationdb-ci commented Aug 19, 2025

Result of foundationdb-pr-clang on Linux RHEL 9

Uh oh!

foundationdb-ci commented Aug 19, 2025

Result of foundationdb-pr-macos on macOS Ventura 13.x

Uh oh!

foundationdb-ci commented Aug 19, 2025

Result of foundationdb-pr on Linux RHEL 9

Uh oh!

foundationdb-ci commented Aug 19, 2025

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

Uh oh!

Uh oh!

Uh oh!

spraza commented Aug 19, 2025 •

edited

Loading