Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphql, node, rpc: improve HTTP write timeout handling #25457

Merged
merged 46 commits into from
Dec 7, 2022

Conversation

s1na
Copy link
Contributor

@s1na s1na commented Aug 1, 2022

A work-around for golang/go#47229. Trigger timeout a bit before stdlib closes the connection so users see a proper error message. Note I still kept the WriteTimeout of the http.Server.

I'm not sure what happens if the method handler and the timeout statement both write at the same time

Fixes #21430

@s1na s1na requested a review from fjl as a code owner August 1, 2022 12:50
@s1na s1na requested a review from gballet as a code owner August 1, 2022 15:18
@s1na s1na requested a review from holiman as a code owner August 8, 2022 17:15
@s1na
Copy link
Contributor Author

s1na commented Aug 9, 2022

I implemented a different approach we roughly discussed with @fjl:

the rpc handler will run the method in a goroutine now. It waits for that to finish OR the context deadline to expire, in which case returns an error. The method will be running in background until it stops (possible leak). Therefore it's important that long-running methods respect context deadline. I made sure this happens in getLogs which received most of the tickets of this issue.

The timeout is being set in the outer layer (node/rpcstack). Alternatively we can pass it through to handler for it to set the timeout.

Note: The error type I added feels awkward between official json-rpc errors.

node/rpcstack.go Outdated
@@ -198,6 +198,9 @@ func (h *httpServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// if http-rpc is enabled, try to serve request
rpc := h.httpHandler.Load().(*rpcHandler)
if rpc != nil {
ctx, cancel := context.WithTimeout(r.Context(), h.timeouts.ReadTimeout-(50*time.Millisecond))
Copy link
Contributor Author

@s1na s1na Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this here. 50 ms is totally a random number. I think lower numbers like 20ms should also be fine. Important thing is the timeout hits before the stdlib's http library kills the connection.

rpc/handler.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor

holiman commented Aug 30, 2022

This seems to work well. With the test-method like this:

diff --git a/internal/ethapi/api.go b/internal/ethapi/api.go
index 90322033b9..b90ac5048d 100644
--- a/internal/ethapi/api.go
+++ b/internal/ethapi/api.go
@@ -2035,3 +2035,10 @@ func toHexSlice(b [][]byte) []string {
 	}
 	return r
 }
+
+func (api *DebugAPI) TimeMeOut(ctx context.Context, seconds uint64) (string, error) {
+	log.Info("TimeMeOut sleeping...", "seconds", seconds)
+	time.Sleep(time.Second * time.Duration(seconds))
+	log.Info("TimeMeOut waking up!")
+	return "Oll korrekt!", nil
+}

The 25s works fine, but 35s times out

[user@work go-ethereum]$ curl --data '{"method":"debug_timeMeOut","params":[25],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545
{"jsonrpc":"2.0","id":1,"result":"Oll korrekt!"}
[user@work go-ethereum]$ curl --data '{"method":"debug_timeMeOut","params":[35],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545
{"jsonrpc":"2.0","id":1,"error":{"code":408,"message":"request timed out"}}

And the timeout triggers after 30s:

INFO [08-30|18:47:55.105] TimeMeOut sleeping...                    seconds=35
WARN [08-30|18:48:25.056] Served debug_timeMeOut                   conn=127.0.0.1:44346 reqid=1 duration=29.950591423s err="request timed out"
INFO [08-30|18:48:30.106] TimeMeOut waking up! 

Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like about this PR, is that the RPC handler will keep running in case of timeout, because it executes the RPC call on a background goroutine now.

It would be nicer to keep this out and rely on the individual RPC handlers to exit promptly when the context is canceled.

rpc/handler.go Outdated
@@ -334,8 +334,19 @@ func (h *handler) handleCall(cp *callProc, msg *jsonrpcMessage) *jsonrpcMessage
return msg.errorResponse(&invalidParamsError{err.Error()})
}
start := time.Now()
answer := h.runMethod(cp.ctx, msg, callb, args)
answerCh := make(chan *jsonrpcMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This channel needs to be buffered, otherwise the handler goroutine will leak when there is a timeout.

@holiman
Copy link
Contributor

holiman commented Aug 31, 2022

What I don't like about this PR, is that the RPC handler will keep running in case of timeout, because it executes the RPC call on a background goroutine now.

I agree about this. If you have a DoS-RPC request that you use to crash some remote node (or infura), then this feature right here would make it even easier to cause DoS. You could just sequentially fire and forget, and get a multihreaded impact.

@holiman
Copy link
Contributor

holiman commented Nov 5, 2022

TODO Triage discussion: worth doing this or is it dead in the water, due to the inherent limitations of this apprach?

@fjl
Copy link
Contributor

fjl commented Nov 15, 2022

Alternative approach:

When handling RPC call, launch a timer using time.AfterFunc(timeout, callback) where the callback will deliver the error response if the response was not sent yet. This can be done using a 1-buffered channel for example.

responseSlot := make(chan struct{}, 1)
responseSlot <- struct{}{}

responded := make(struct{})

timeoutResponseTimer := time.AfterFunc(timeout, func() {
    select {
    case <-responseSlot:
         // The timeout occurred and the method has not responded yet.
         // send the timeout error response
         close(responded)
    case <-responded:
         // The method responded.
    }
})

response := runMethod()
timeoutResponseTimer.Stop()
select {
case <-responseSlot:
     // send the response
      close(responded)
case <-responded:
     // timeout error response was already sent
}

@fjl
Copy link
Contributor

fjl commented Nov 15, 2022

Actually, it can also be done with less channels using sync.Once

var respondOnce sync.Once
timeoutResponseTimer := time.AfterFunc(timeout, func() {
    respondOnce.Do(func () {
        // send timeout error
    })
})

response := runMethod()
timeoutResponseTimer.Stop()
respondOnce.Do(func () {
    // send response
})

@fjl fjl removed the status:triage label Dec 6, 2022
@fjl fjl changed the title node: rpc write timeout work-around graphql, node, rpc: HTTP write timeout handling Dec 6, 2022
@fjl fjl changed the title graphql, node, rpc: HTTP write timeout handling graphql, node, rpc: improve HTTP write timeout handling Dec 6, 2022
@fjl fjl added this to the 1.11.0 milestone Dec 6, 2022
@fjl
Copy link
Contributor

fjl commented Dec 6, 2022

I have decided to remove changes in eth/filters and eth/tracers from the PR. We can submit them in a separate change.

@fjl
Copy link
Contributor

fjl commented Dec 6, 2022

RPC method handler changes resubmitted in #26320

@fjl fjl merged commit f20eba4 into ethereum:master Dec 7, 2022
shekhirin pushed a commit to shekhirin/go-ethereum that referenced this pull request Jun 6, 2023
Here we add special handling for sending an error response when the write timeout of the
HTTP server is just about to expire. This is surprisingly difficult to get right, since is
must be ensured that all output is fully flushed in time, which needs support from
multiple levels of the RPC handler stack:

The timeout response can't use chunked transfer-encoding because there is no way to write
the final terminating chunk. net/http writes it when the topmost handler returns, but the
timeout will already be over by the time that happens. We decided to disable chunked
encoding by setting content-length explicitly.

Gzip compression must also be disabled for timeout responses because we don't know the
true content-length before compressing all output, i.e. compression would reintroduce
chunked transfer-encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calls for block/transaction information sometimes don't return
3 participants