-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't retry on internal exceptions #1110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this is the right solution?
src/clientlayers/RetryRequest.jl
Outdated
@@ -77,6 +77,9 @@ function retrylayer(handler) | |||
end | |||
|
|||
isrecoverable(ex) = true | |||
isrecoverable(ex::ArgumentError) = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is isrecoverable(ex) = true
just the wrong default?
e.g. what if CloudBase stops throwing a ArgumentError
and decides to throw a MalformedTokenError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this is the right solution?
This seemed like the least disruptive fix for the problem we've seen. Maybe the right solution is to formalize the contract a layer
needs to fulfill wrt throwing exceptions.
is isrecoverable(ex) = true just the wrong default?
I think Jacob planned a bigger refactor for retries, switching exception retries from opt-out to opt-in would make sense to me.
what if CloudBase stops throwing a ArgumentError and decides to throw a MalformedTokenError?
I understand, but I'm less worried about this possibility since this is the first instance of this issue we've seen and I don't expect the underlying packages to make such changes without us noticing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this bug (i.e. we'd retry on exceptions happening inside layers) shows that isrecoverable(ex) = true
is the wrong default. What would be the disruption of flipping that to isrecoverable(ex) = false
?
I don't expect the underlying packages to make such changes without us noticing
packages like CloudBase.jl are users on HTTP.jl (not dependencies), and we don't have much visibility on users, so i am worried about other users running into this same issue (just instead of via an ArgumentError via a MethodError or a AssertionError or a BoundError or ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Blame shows this as the PR that set that as the default: #990
Looks like maybe the PR was trying to change to always retry if the request is retryable (has a retryable body or is idempotent)? But yeah... it does seem like a strange default...
Do you have any memory of why that PR was made, @nickrobinson251?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that PR was a fix for some oversight in the preceding rewrite: https://github.com/JuliaWeb/HTTP.jl/pull/974/files
I don't believe that rewrite PR intended to change any behaviour, only increase flexibility. The description (from the previous draft of the same thing #877) was
this PR proposes more configurability for retry logic in the hopes that users won't opt to roll their own incorrect retry logic.
and i notice that before that refactor we had
isrecoverable(e) = false
isrecoverable(e::Union{Base.EOFError, Base.IOError, MbedTLS.MbedException, OpenSSL.OpenSSLError}) = true
isrecoverable(e::ArgumentError) = e.msg == "stream is closed or unusable"
isrecoverable(e::Sockets.DNSError) = true
isrecoverable(e::ConnectError) = true
isrecoverable(e::RequestError) = isrecoverable(e.error)
isrecoverable(e::StatusError) = retryable(e.status)
which makes me think the default of true
was unintended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusingly, there is now a failing test:
@test HTTP.RetryRequest.isrecoverable(ErrorException(""))
so my attempt to make an improvement with as little behavior change as possible was doomed to fail from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the disruption of flipping that to isrecoverable(ex) = false?
The fact that we wouldn't retry because we relied on certain exceptions being considered recoverable implicitly. Even this current PR seem to have missed isrecoverable(e::ArgumentError) = e.msg == "stream is closed or unusable"
which is unfortunate.
Here are all mentions of Features an amazing ascii art for docs, where did it go? Would be great for extended help docstrings. |
Here is a short history of isrecoverable(e) = false
isrecoverable(e::Union{Base.EOFError, Base.IOError, MbedTLS.MbedException, OpenSSL.OpenSSLError}) = true
isrecoverable(e::ArgumentError) = e.msg == "stream is closed or unusable"
isrecoverable(e::Sockets.DNSError) = true
isrecoverable(e::ConnectError) = true
isrecoverable(e::RequestError) = isrecoverable(e.error)
isrecoverable(e::StatusError) = retryable(e.status) -> # yes, nothing -- this effectively meant `isrecoverable(ex) = true` -> isrecoverable(ex) = true
isrecoverable(ex::StatusError) = retryable(ex.status) -> isrecoverable(ex) = true
isrecoverable(ex::ConnectError) = ex.error isa Sockets.DNSError && ex.error.code == EAI_AGAIN ? false : true
isrecoverable(ex::StatusError) = retryable(ex.status) -> isrecoverable(ex) = true
isrecoverable(ex::CapturedException) = isrecoverable(ex.ex)
isrecoverable(ex::ConnectError) = ex.error isa Sockets.DNSError && ex.error.code == EAI_AGAIN ? false : true
isrecoverable(ex::StatusError) = retryable(ex.status) -> isrecoverable(ex) = true
isrecoverable(ex::CapturedException) = isrecoverable(ex.ex)
isrecoverable(ex::ConnectError) = isrecoverable(ex.error)
isrecoverable(ex::Sockets.DNSError) = (ex.code == Base.UV_EAI_AGAIN)
isrecoverable(ex::StatusError) = retryable(ex.status) From that it seems to me, that default to true was deliberate -- without it no common retriable exceptions would've been retried. |
So my understanding is that we want to go back full circle, carrying on some of the tweaks we did to the original isrecoverable(e) = false
isrecoverable(e::Union{Base.EOFError, Base.IOError, MbedTLS.MbedException, OpenSSL.OpenSSLError}) = true
isrecoverable(e::ArgumentError) = e.msg == "stream is closed or unusable"
+isrecoverable(ex::CapturedException) = isrecoverable(ex.ex) # forgot this one yesterday
-isrecoverable(e::Sockets.DNSError) = true
+isrecoverable(e::Sockets.DNSError) = (ex.code == Base.UV_EAI_AGAIN)
-isrecoverable(e::ConnectError) = true
+isrecoverable(e::ConnectError) = isrecoverable(e.error)
isrecoverable(e::RequestError) = isrecoverable(e.error)
isrecoverable(e::StatusError) = retryable(e.status) |
Just a quick chime-in that the discussion all sounds great and correct. We used to have default |
Thanks Jacob! Good points, I totally forgot about composite exceptions and ExceptionUnwrapping.
I agree we want to do that eventually. Earlier I suggested that we might want to formalize what the layer can expect to be retried. Maybe via making it the layers job to wrap retryable exception in an |
Codecov Report
@@ Coverage Diff @@
## master #1110 +/- ##
==========================================
+ Coverage 82.34% 82.49% +0.14%
==========================================
Files 32 32
Lines 3042 3044 +2
==========================================
+ Hits 2505 2511 +6
+ Misses 537 533 -4
... and 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -92,9 +93,11 @@ end | |||
|
|||
function no_retry_reason(ex, req) | |||
buf = IOBuffer() | |||
unwrapped_ex = unwrap_exception(ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this unwrap_exception
if we're doing it via dispatch in isrecoverable
; so I think we remove it here and keep it above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but this is just for the no_retry_reson
, so maybe we just need this for printing, which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly yeah, this is for printing. Thanks!
Previously, we'd retry on exceptions happening inside layers, e.g. when cloudsign layer in CloudStore.jl decodes base64 encoded SAS tokens and if those are malformed we could get an
ArgumentException