-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link healthcheckextension with memory limiter rejecting spans #30168
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
👋 hello! At GitHub, we've been running into this issue and are very interested in seeing this change funded 😄 We observe the following:
We expected the readinessProbe to fail when the memory limiter starts causing the collector to drop or refuse spans, but requests continued to succeed thus getting more traffic sent to them, resulting in the rise in 5xx responses to requests coming through the otel/http receiver. We’re interested in this behavior:
I know we chatted about this a bit in Slack but I thought I'd through a comment here for transparency. Thanks and do let us know if this is on the radar for fixing anytime soon! |
Please take a look at #30673 as it might offer a fix for this issue. We could use help to review and try this out. |
I think that the foundations are in place to solve this problem, but the problem is likely not solved as is. #30673 introduces a version of the healthcheck extension based on component status reporting, which is a prerequisite. The next piece would be to update the memory limiter to report error statuses (via component status reporting) when it detects problematic conditions and to clear them when they resolve. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
extension/healthcheck
Is your feature request related to a problem? Please describe.
the memory limiter processor is rejecting spans. I would like to setup a readiness probe for the pod and when it starts rejecting spans, have the probe fail. This will stop more spans coming to the pod until it can recover, and should cause clients to send spans to other pods.
Describe the solution you'd like
Have the health check extension fail if spans are being rejected.
Describe alternatives you've considered
A readiness probe that scrapes the self metrics of the otel collector pod to see if there are rejected spans.
Additional context
No response
The text was updated successfully, but these errors were encountered: