Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't retry on internal exceptions #1110

Merged
merged 3 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/clientlayers/RetryRequest.jl
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@ function retrylayer(handler)
end

isrecoverable(ex) = true
isrecoverable(ex::ArgumentError) = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is isrecoverable(ex) = true just the wrong default?

e.g. what if CloudBase stops throwing a ArgumentError and decides to throw a MalformedTokenError?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this is the right solution?

This seemed like the least disruptive fix for the problem we've seen. Maybe the right solution is to formalize the contract a layer needs to fulfill wrt throwing exceptions.

is isrecoverable(ex) = true just the wrong default?

I think Jacob planned a bigger refactor for retries, switching exception retries from opt-out to opt-in would make sense to me.

what if CloudBase stops throwing a ArgumentError and decides to throw a MalformedTokenError?

I understand, but I'm less worried about this possibility since this is the first instance of this issue we've seen and I don't expect the underlying packages to make such changes without us noticing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this bug (i.e. we'd retry on exceptions happening inside layers) shows that isrecoverable(ex) = true is the wrong default. What would be the disruption of flipping that to isrecoverable(ex) = false?

I don't expect the underlying packages to make such changes without us noticing

packages like CloudBase.jl are users on HTTP.jl (not dependencies), and we don't have much visibility on users, so i am worried about other users running into this same issue (just instead of via an ArgumentError via a MethodError or a AssertionError or a BoundError or ...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @quinnj is not around, perhaps we can get a third view from @NHDaly (and/or whoever was running into this issue)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Blame shows this as the PR that set that as the default: #990

Looks like maybe the PR was trying to change to always retry if the request is retryable (has a retryable body or is idempotent)? But yeah... it does seem like a strange default...

Do you have any memory of why that PR was made, @nickrobinson251?

Copy link
Collaborator

@nickrobinson251 nickrobinson251 Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that PR was a fix for some oversight in the preceding rewrite: https://github.com/JuliaWeb/HTTP.jl/pull/974/files

I don't believe that rewrite PR intended to change any behaviour, only increase flexibility. The description (from the previous draft of the same thing #877) was

this PR proposes more configurability for retry logic in the hopes that users won't opt to roll their own incorrect retry logic.

and i notice that before that refactor we had

isrecoverable(e) = false 
isrecoverable(e::Union{Base.EOFError, Base.IOError, MbedTLS.MbedException, OpenSSL.OpenSSLError}) = true
isrecoverable(e::ArgumentError) = e.msg == "stream is closed or unusable"
isrecoverable(e::Sockets.DNSError) = true
isrecoverable(e::ConnectError) = true
isrecoverable(e::RequestError) = isrecoverable(e.error)
isrecoverable(e::StatusError) = retryable(e.status)

which makes me think the default of true was unintended

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amusingly, there is now a failing test:

@test HTTP.RetryRequest.isrecoverable(ErrorException(""))

so my attempt to make an improvement with as little behavior change as possible was doomed to fail from the beginning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the disruption of flipping that to isrecoverable(ex) = false?

The fact that we wouldn't retry because we relied on certain exceptions being considered recoverable implicitly. Even this current PR seem to have missed isrecoverable(e::ArgumentError) = e.msg == "stream is closed or unusable" which is unfortunate.

isrecoverable(ex::ErrorException) = false
isrecoverable(ex::RequestError) = isrecoverable(ex.error)
isrecoverable(ex::CapturedException) = isrecoverable(ex.ex)
isrecoverable(ex::ConnectError) = isrecoverable(ex.error)
# Treat all DNS errors except `EAI_AGAIN`` as non-recoverable
Expand Down
47 changes: 47 additions & 0 deletions test/client.jl
Original file line number Diff line number Diff line change
Expand Up @@ -645,6 +645,53 @@ end
end
end

@testset "Don't retry on internal exceptions" begin
kws = (retry_delays = [1, 2, 3], retries=3) # ~ 6 secs
max_wait = 3

function test_finish_within(f, secs)
timedout = Ref(false)
t = Timer((t)->(timedout[] = true), secs)
try
f()
finally
close(t)
end
@test !timedout[]
end

expected = ErrorException("request")
test_finish_within(max_wait) do
@test_throws expected ErrorRequest.get("https://$httpbin/ip"; request_exception=expected, kws...)
end
expected = ArgumentError("request")
test_finish_within(max_wait) do
@test_throws expected ErrorRequest.get("https://$httpbin/ip"; request_exception=expected, kws...)
end

test_finish_within(max_wait) do
expected = ErrorException("stream")
e = try
ErrorRequest.get("https://$httpbin/ip"; stream_exception=expected, kws...)
catch e
e
end
@assert e isa HTTP.RequestError
@test e.error == expected
end

test_finish_within(max_wait) do
expected = ArgumentError("stream")
e = try
ErrorRequest.get("https://$httpbin/ip"; stream_exception=expected, kws...)
catch e
e
end
@assert e isa HTTP.RequestError
@test e.error == expected
end
end

@testset "Retry with ConnectError" begin
mktemp() do path, io
redirect_stdout(io) do
Expand Down
22 changes: 22 additions & 0 deletions test/resources/TestRequest.jl
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,25 @@ end
HTTP.@client (first=[testouterrequestlayer], last=[testinnerrequestlayer]) (first=[testouterstreamlayer], last=[testinnerstreamlayer])

end

module ErrorRequest

using HTTP

function throwingrequestlayer(handler)
return function(req; request_exception=nothing, kw...)
!isnothing(request_exception) && throw(request_exception)
return handler(req; kw...)
end
end

function throwingstreamlayer(handler)
return function(stream; stream_exception=nothing, kw...)
!isnothing(stream_exception) && throw(stream_exception)
return handler(stream; kw...)
end
end

HTTP.@client (throwingrequestlayer,) (throwingstreamlayer,)

end
Loading