Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4460: Enable per-request Read/Write deadline #4462

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Jan 31, 2024

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 31, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2024
@tkashem tkashem force-pushed the read-write-timeout branch from c50db9a to dc18f53 Compare January 31, 2024 15:03
@tkashem tkashem force-pushed the read-write-timeout branch from dc18f53 to 0eb2e0b Compare February 3, 2024 02:15
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 3, 2024
@tkashem tkashem force-pushed the read-write-timeout branch 3 times, most recently from b683acb to 703bf6d Compare February 4, 2024 02:25
@tkashem tkashem marked this pull request as ready for review February 4, 2024 15:44
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2024
@tkashem tkashem force-pushed the read-write-timeout branch 4 times, most recently from b85485d to 1664f37 Compare February 4, 2024 17:20
@tkashem tkashem force-pushed the read-write-timeout branch 4 times, most recently from 4149d1b to a6aa454 Compare February 5, 2024 12:34
@tkashem
Copy link
Contributor Author

tkashem commented Feb 5, 2024

/cc @sttts

@tkashem tkashem force-pushed the read-write-timeout branch 4 times, most recently from 77bfe05 to f0b8690 Compare April 17, 2024 13:50
keps/sig-api-machinery/4460-per-request-deadline/README.md Outdated Show resolved Hide resolved
keps/sig-api-machinery/4460-per-request-deadline/README.md Outdated Show resolved Hide resolved

## Design Details

### How We Manage Request Timeout Today:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How similar is the Kubernetes timeout filter to https://pkg.go.dev/net/http#TimeoutHandler? Are there examples of attempts outside of Kubernetes to address similar concerns?

Comment on lines +1037 to +1122
An API client will no longer receive the `504 GatewayTimeout` HTTP status code
when the server times out a request. Instead, the client will receive an `error`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is handling these errors more expensive than 504 timeouts? At least for HTTP1, I guess the client will typically connect() again (and perform another TLS handshake, etc.)?

I'm curious about pathological situations that could be systemically worsened if instead of receiving a 504 and retrying on an existing socket, clients instead need to reconnect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for HTTP/1x, the connection will be closed, so the client needs to reconnect. For http/2.0, the stream will be reset, and the tcp connection will still be intact.
I will add a test to document this behavior.


### Termination of the Request Handler

Upon timeout, a request handler can self-terminate by returning the error it receives while:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider: we could wrap context.Value of the request's context to panic if the request has been canceled. Functions like RequestInfoFrom constantly fetch values off the context, so it can serve as a good preemption point in addition to the response writer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea, we could wrap the context with

type PanicAfterTimeoutContext struct {
	context.Context
}

func (c PanicAfterTimeoutContext) Value(key any) any {
	if c.Context.Err() != nil {
		panic(...)
	}
	return c.Context.Value(key)
}

i think it should not have any ripple effect, and any new context derived from the wrapped PanicAfterTimeoutContext (using WithValue, WithCancel etc.) still works as expected, i wrote a quick test to verify this:

type keyType1 int

const key1 keyType1 = iota

type keyType2 int

const key2 keyType2 = iota

type keyType3 int

const key3 keyType3 = iota

type PanicAfterTimeoutContext struct {
	context.Context
}

func (c PanicAfterTimeoutContext) Value(key any) any {
	if c.Context.Err() != nil {
		panic("timeout occurred")
	}
	return c.Context.Value(key)
}

func TestContext(t *testing.T) {
	parent, cancel1 := context.WithCancel(context.Background())
	defer cancel1()
	parent = context.WithValue(parent, key1, "value1")

	child, cancel2 := context.WithTimeout(parent, time.Hour)
	child = context.WithValue(child, key2, "value2")

	wrapped := &PanicAfterTimeoutContext{Context: child}
	// derive a context from wrapped
	ctx, cancel := context.WithCancel(wrapped)
	defer cancel()
	ctx = context.WithValue(ctx, key3, "value3")

	if got := ctx.Value(key1).(string); got != "value1" {
		t.Errorf("expected value: %q, got: %q", "value1", got)
	}
	if got := ctx.Value(key2).(string); got != "value2" {
		t.Errorf("expected value: %q, got: %q", "value2", got)
	}
	if got := ctx.Value(key3).(string); got != "value3" {
		t.Errorf("expected value: %q, got: %q", "value3", got)
	}

	cancel2()
	func() {
		defer func() {
			r := recover()
			if r == nil {
				t.Errorf("expected a panic")
			}
			t.Logf("panic: %v", r)
		}()
		ctx.Value(key1)
	}()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, when I experimented with this approach in the past, I discovered that the context package would call the Value method under certain circumstances from other go routines, so a panic in there would kill the whole process. The way I mitigated that was checking that the current caller was in a http handler stack before panic-ing. There might be other approaches as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On he other hand, if we want to do post-timeout logging/reporting metric, we would need to access the request scoped attributes using the context, and the panic from Value is not desirable there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could always special case whatever case you needed to do preserve.

Probably it would be best to experiment and measure how much benefit we see in terms of resource usage by pre-empting earlier via the Value method vs just the request writer. Then you can determine if it is worth the effort to implement.

## Summary

With this proposal, the Kubernetes API server will enable per-request read/write
timeouts using the `ResponseController` functionality provided by the golang net/http library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of my initial reservation on this KEP was because I thought it was going to use net/http.TimeoutHandler which is strictly worse than our handler. Setting deadlines will close the http2 stream, which does cover "half" of the resource usage (basically it does not cover the handler logic which runs in its own go routine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, yes, we are not going to use net/http.TimeoutHandler.

Setting deadlines will close the http2 stream, which does cover "half" of the resource usage (basically it does not cover the handler logic which runs in its own go routine)

today the other half (the inner handler running in its own goroutine) is not directly managed by the timeout filter since there is no way to kill a goroutine externally. The timeout handler delegates the Write method of the ResponseWriter object:

https://github.com/kubernetes/kubernetes/blob/57b406a18afc54c84725488e0ca3d4b4cabd61db/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go#L200-L202

So when the inner handler writes to the ResponseWriter object after timeout occurs, it returns an http.ErrHandlerTimeout error, this provides a way for the inner handler to return and allow the goroutine to finish.
The KEP proposes a similar approach, either return an error or panic from the Write method after the timeout occurs

Note: This PR kubernetes/kubernetes#124730 documents the request timeout behavior with the timeout handler as it is today.

Comment on lines 104 to 106
don't attach any deadline to the `Context` of a WATCH request. An orthogonal
effort to allocate a deadline to the `Context`of WATCH request(s) is a
prerequisite, and such effort is outside the scope of this proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to parse what this is saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today, our timeout handler does not handle long-running requests including WATCH, it only handles the short (non long-running) requests. The scope of this proposal is the same, short (non long-running) requests only.

An orthogonal effort to allocate a deadline to the Context of WATCH request(s) is a prerequisite, and such effort is outside the scope of this proposal.

today for any WATCH request, if we inspect the context of the request it would not have any deadline. This work in progress PR kubernetes/kubernetes#122976 makes an attempt to wire context with a proper deadline to a WATCH request. This effort is orthogonal and is outside the scope of this KEP.

Comment on lines +150 to +156
In this scenario, the client can potentially hang indefinitely, waiting for a
response from the server.
It's worth noting that for HTTP/1x, [net/http does not execute the request handler in a new goroutine](https://github.com/golang/go/blob/b8ac61e6e64c92f23d8cf868a92a70d13e20a124/src/net/http/server.go#L2031-L2039).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way for us to mimic the http2 behavior in http1 without using the timeout handler that we have today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we don't have the timeout handler the entire handler chain executes in a single goroutine, and there is no mechanism to preempt a goroutine, i think, a reasonable option is to panic from within the Write and Flush method of the ResponseWriter object.

I will investigate if there is a way to achieve this without using the timeout handler, I opened an issue to track this: golang/go#65526

@tkashem tkashem force-pushed the read-write-timeout branch 4 times, most recently from 0f21086 to b1d50b7 Compare May 14, 2024 23:25
@tkashem tkashem force-pushed the read-write-timeout branch 3 times, most recently from 5e36f0a to c6c1f46 Compare June 3, 2024 14:15
@deads2k
Copy link
Contributor

deads2k commented Jun 3, 2024

test plan looks thorough, the expected equivalence looks good. @enj has an outstanding question on it. With a featuregate, I think we can try removing our special cased timeout handler. I won't have time to personally review implementation.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 3, 2024
@tkashem tkashem force-pushed the read-write-timeout branch from c6c1f46 to 6131325 Compare June 6, 2024 16:08
@enj
Copy link
Member

enj commented Jun 6, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit bc254c2 into kubernetes:master Jun 6, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jun 6, 2024
@wojtek-t
Copy link
Member

wojtek-t commented Jun 7, 2024

For posterity - it merged before my PRR review, but this looks fine for Alpha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants