-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[backport 7.61.x] [bug fix] address configsync issues to prevent cpu leak in otel-agent #32645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cfg, err := fetchConfig(cs.ctx, cs.client, cs.Authtoken.Get(), cs.url.String()) | ||
if err != nil { | ||
if cs.connected { | ||
cs.Log.Warnf("Failed to fetch config from core agent: %v", err) | ||
cs.Log.Warnf("Loosed connectivity to core-agent to fetch config: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better grammar..
cs.Log.Warnf("Loosed connectivity to core-agent to fetch config: %v", err) | |
cs.Log.Warnf("Lost connectivity to core-agent to fetch config: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I made exactly the same comment in the original PR and thought that had been fixed.
Let's fix it in the main branch, and let that get picked up later. I'd rather not change the backported PRs at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for agent-apm!
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=52099698 --os-family=ubuntu Note: This applies to commit 414ac31 |
What does this PR do?
This PR backports:
Motivation
Previous RC's showed an issue by which badly resolved secrets caused a CPU leak in the
otel-agent
. By properly resolving secrets with a new resolution pattern, we can now prevent such behavior.Describe how you validated your changes
This was validated in gizmo.
Should also be validated with the correct and healthy behavior of an RC with the
otel-agent
enabled through-out staging with no noticeable increase in cpu or memory over time for the deployed agents.Possible Drawbacks / Trade-offs
Additional Notes
This does not solve the root cause, which is in the datadog exporter and is caused by an accumulation of failed transactions. While a CPU increase can be caused by failed transaction and their reattempt, we need to make sure any resource impact is bounded.