Ensure that bad HTTP status codes result in the producer retrying#21
Conversation
SemMulder
left a comment
There was a problem hiding this comment.
Awesome! Two small comments.
There was a problem hiding this comment.
I think you might have missed this one? :)
There was a problem hiding this comment.
I have now added an .error_for_status() call to the version endpoint call, but intentionally leave out any retry logic there, as that endpoint is intended to (only) be used by hand. (There is also no parsing of the response result going on).
opsqueue/src/producer/client.rs
Outdated
| // Maybe a different HTTP client library is nicer in this regard? | ||
| Self::HTTPClientError(inner) => { | ||
| inner.is_connect() || inner.is_timeout() || inner.is_decode() | ||
| inner.is_status() || inner.is_connect() || inner.is_timeout() || inner.is_decode() |
There was a problem hiding this comment.
This marks everything from 400 to 599 inclusive as ephemeral.
Do we really want to include the whole 400 range? E.g. what if we post a bad body because of some version mismatch or something? I'm too unfamiliar with the code to judge whether it matters, what do you think?
There was a problem hiding this comment.
You're right, only catching 5xx is the correct approach.
4xx errors can (only?) come up if e.g. a Traefik or other prroxy in the middle is misconfigured, in which case we'd like to see that early.
There was a problem hiding this comment.
One edge-case is 429 Too Many Requests though, in that case you likely want to retry?
There was a problem hiding this comment.
Possibly. Let's see if we end up triggering that case in production.
|
@OpsBotPrime merge and deploy to production |
|
Unknown or invalid command found: |
|
@OpsBotPrime merge and tag |
|
Rebased as 74328de, waiting for CI … |
…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false
|
CI job 🟡 started. |
|
The build failed ❌. If this is the result of a flaky test, then tag me again with the |
|
@OpsBotPrime retry |
Before, this would result in a non-ephemeral `InternalProducerClientError` as the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.
…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false
|
Rebased as b4fb822, waiting for CI … |
|
CI job 🟡 started. |
3b8bae3 to
b4fb822
Compare
Before, this would result in a non-ephemeral
InternalProducerClientErroras the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.