Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make memory_limiter and otlp_receiver return 503 status code instead of 500 on failure #9337

Closed
climberfree opened this issue Jan 22, 2024 · 2 comments · Fixed by #9357
Closed
Assignees
Labels
bug Something isn't working good first issue Good for newcomers receiver/otlp

Comments

@climberfree
Copy link

The problem
When the collector fails due to memory_limiter, the agent receives a 500 status code. This status code is not retryable, and the error sometimes resolves within minutes, especially after creating one more collector's instance through autoscaling.

The solution
We propose making the memory_limiter and otlp_receiver return a 503 status code instead of 500 in situations where the failure is temporary and expected to be resolved shortly.

Benefits:

  • makes the situation retryable, allowing the agent to attempt a new connection after a brief delay.
  • aligns with common HTTP status code semantics, indicating that the service is temporarily unavailable.
@dmitryax
Copy link
Member

This should be already resolved and available in 0.93.0 with #8080 merged

@TylerHelmuth
Copy link
Member

@dmitryax I think #8080 actually only fixed the problem when using grpc. If using http, we still have these line in our code:

otlpResp, err := logsReceiver.Export(req.Context(), otlpReq)
if err != nil {
writeError(resp, enc, err, http.StatusInternalServerError)
return
}

With #8080 the value in err is now properly a Status with code Unavailable, but our http logic is still returning 500.

@TylerHelmuth TylerHelmuth added bug Something isn't working help wanted Good issue for contributors to OpenTelemetry Service to pick up good first issue Good for newcomers receiver/otlp and removed help wanted Good issue for contributors to OpenTelemetry Service to pick up labels Jan 22, 2024
@TylerHelmuth TylerHelmuth self-assigned this Jan 23, 2024
mx-psi pushed a commit that referenced this issue Mar 27, 2024
…rrors (#9357)

**Description:**
Updates the receiver's http response to return a proper http status
based on whether or not the pipeline returned a retryable error. Builds
upon the work done in
#8080 and
#9307

**Link to tracking Issue:**

Closes
#9337
Closes
#8132
Closes
#9636
Closes
#6725

**Testing:**

Updated lots of unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers receiver/otlp
Projects
None yet
3 participants