-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4460: Enable per-request Read/Write deadline #4462
Conversation
Skipping CI for Draft Pull Request. |
c50db9a
to
dc18f53
Compare
dc18f53
to
0eb2e0b
Compare
b683acb
to
703bf6d
Compare
b85485d
to
1664f37
Compare
4149d1b
to
a6aa454
Compare
/cc @sttts |
77bfe05
to
f0b8690
Compare
|
||
## Design Details | ||
|
||
### How We Manage Request Timeout Today: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How similar is the Kubernetes timeout filter to https://pkg.go.dev/net/http#TimeoutHandler? Are there examples of attempts outside of Kubernetes to address similar concerns?
An API client will no longer receive the `504 GatewayTimeout` HTTP status code | ||
when the server times out a request. Instead, the client will receive an `error` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is handling these errors more expensive than 504
timeouts? At least for HTTP1, I guess the client will typically connect()
again (and perform another TLS handshake, etc.)?
I'm curious about pathological situations that could be systemically worsened if instead of receiving a 504
and retrying on an existing socket, clients instead need to reconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for HTTP/1x, the connection will be closed, so the client needs to reconnect. For http/2.0, the stream will be reset, and the tcp connection will still be intact.
I will add a test to document this behavior.
f0b8690
to
2af1e6c
Compare
0d9940b
to
a37fe81
Compare
|
||
### Termination of the Request Handler | ||
|
||
Upon timeout, a request handler can self-terminate by returning the error it receives while: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider: we could wrap context.Value
of the request's context to panic
if the request has been canceled. Functions like RequestInfoFrom
constantly fetch values off the context, so it can serve as a good preemption point in addition to the response writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good idea, we could wrap the context with
type PanicAfterTimeoutContext struct {
context.Context
}
func (c PanicAfterTimeoutContext) Value(key any) any {
if c.Context.Err() != nil {
panic(...)
}
return c.Context.Value(key)
}
i think it should not have any ripple effect, and any new context derived from the wrapped PanicAfterTimeoutContext
(using WithValue, WithCancel etc.) still works as expected, i wrote a quick test to verify this:
type keyType1 int
const key1 keyType1 = iota
type keyType2 int
const key2 keyType2 = iota
type keyType3 int
const key3 keyType3 = iota
type PanicAfterTimeoutContext struct {
context.Context
}
func (c PanicAfterTimeoutContext) Value(key any) any {
if c.Context.Err() != nil {
panic("timeout occurred")
}
return c.Context.Value(key)
}
func TestContext(t *testing.T) {
parent, cancel1 := context.WithCancel(context.Background())
defer cancel1()
parent = context.WithValue(parent, key1, "value1")
child, cancel2 := context.WithTimeout(parent, time.Hour)
child = context.WithValue(child, key2, "value2")
wrapped := &PanicAfterTimeoutContext{Context: child}
// derive a context from wrapped
ctx, cancel := context.WithCancel(wrapped)
defer cancel()
ctx = context.WithValue(ctx, key3, "value3")
if got := ctx.Value(key1).(string); got != "value1" {
t.Errorf("expected value: %q, got: %q", "value1", got)
}
if got := ctx.Value(key2).(string); got != "value2" {
t.Errorf("expected value: %q, got: %q", "value2", got)
}
if got := ctx.Value(key3).(string); got != "value3" {
t.Errorf("expected value: %q, got: %q", "value3", got)
}
cancel2()
func() {
defer func() {
r := recover()
if r == nil {
t.Errorf("expected a panic")
}
t.Logf("panic: %v", r)
}()
ctx.Value(key1)
}()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, when I experimented with this approach in the past, I discovered that the context
package would call the Value
method under certain circumstances from other go routines, so a panic
in there would kill the whole process. The way I mitigated that was checking that the current caller was in a http handler stack before panic
-ing. There might be other approaches as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On he other hand, if we want to do post-timeout logging/reporting metric, we would need to access the request scoped attributes using the context, and the panic from Value
is not desirable there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could always special case whatever case you needed to do preserve.
Probably it would be best to experiment and measure how much benefit we see in terms of resource usage by pre-empting earlier via the Value
method vs just the request writer. Then you can determine if it is worth the effort to implement.
## Summary | ||
|
||
With this proposal, the Kubernetes API server will enable per-request read/write | ||
timeouts using the `ResponseController` functionality provided by the golang net/http library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of my initial reservation on this KEP was because I thought it was going to use net/http.TimeoutHandler which is strictly worse than our handler. Setting deadlines will close the http2 stream, which does cover "half" of the resource usage (basically it does not cover the handler logic which runs in its own go routine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, yes, we are not going to use net/http.TimeoutHandler.
Setting deadlines will close the http2 stream, which does cover "half" of the resource usage (basically it does not cover the handler logic which runs in its own go routine)
today the other half (the inner handler running in its own goroutine) is not directly managed by the timeout filter since there is no way to kill a goroutine externally. The timeout handler delegates the Write
method of the ResponseWriter
object:
So when the inner handler writes to the ResponseWriter
object after timeout occurs, it returns an http.ErrHandlerTimeout
error, this provides a way for the inner handler to return and allow the goroutine to finish.
The KEP proposes a similar approach, either return an error or panic from the Write method after the timeout occurs
Note: This PR kubernetes/kubernetes#124730 documents the request timeout behavior with the timeout handler as it is today.
don't attach any deadline to the `Context` of a WATCH request. An orthogonal | ||
effort to allocate a deadline to the `Context`of WATCH request(s) is a | ||
prerequisite, and such effort is outside the scope of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling to parse what this is saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today, our timeout handler does not handle long-running requests including WATCH, it only handles the short (non long-running) requests. The scope of this proposal is the same, short (non long-running) requests only.
An orthogonal effort to allocate a deadline to the
Context
of WATCH request(s) is a prerequisite, and such effort is outside the scope of this proposal.
today for any WATCH request, if we inspect the context
of the request it would not have any deadline. This work in progress PR kubernetes/kubernetes#122976 makes an attempt to wire context with a proper deadline to a WATCH request. This effort is orthogonal and is outside the scope of this KEP.
In this scenario, the client can potentially hang indefinitely, waiting for a | ||
response from the server. | ||
It's worth noting that for HTTP/1x, [net/http does not execute the request handler in a new goroutine](https://github.com/golang/go/blob/b8ac61e6e64c92f23d8cf868a92a70d13e20a124/src/net/http/server.go#L2031-L2039). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way for us to mimic the http2 behavior in http1 without using the timeout
handler that we have today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we don't have the timeout handler the entire handler chain executes in a single goroutine, and there is no mechanism to preempt a goroutine, i think, a reasonable option is to panic from within the Write
and Flush
method of the ResponseWriter
object.
I will investigate if there is a way to achieve this without using the timeout handler, I opened an issue to track this: golang/go#65526
0f21086
to
b1d50b7
Compare
5e36f0a
to
c6c1f46
Compare
test plan looks thorough, the expected equivalence looks good. @enj has an outstanding question on it. With a featuregate, I think we can try removing our special cased timeout handler. I won't have time to personally review implementation. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, tkashem The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c6c1f46
to
6131325
Compare
/lgtm |
For posterity - it merged before my PRR review, but this looks fine for Alpha. |
One-line PR description: adding new KEP
Issue link: Enable per-request Read/Write Deadline #4460