-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-10048: UPSTREAM: 115328: apiserver: annotate early (server not ready) and late (during shutdown) requests #1456
Conversation
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
At a high-level, why not put this in the same spot as the existing audit log annotation adding filter so they are together. One for "termination has started" and one for "still getting requests really late" side-by-side? it's down in staging/src/k8s.io/apiserver/pkg/server/patch_genericapiserver.go |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@deads2k I would like to push this upstream, so keeping it separate for now. My goal is to move the bits and pieces we need from staging/src/k8s.io/apiserver/pkg/server/patch_genericapiserver.go into here. |
|
||
// Signaled returns a channel that is closed when the underlying event | ||
// has been signaled. Successive calls to Signaled return the same value. | ||
Signaled() <-chan struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I a name that tell me what is being signaled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not unless we wrap it and give it another name, we have multiple signals
https://github.com/kubernetes/kubernetes/blob/125e38c0872521bec49351415cc9fb624b046659/staging/src/k8s.io/apiserver/pkg/server/lifecycle_signals.go#L112-L147
Name
actually returns the name of the signal which i am including in the annotation
func (e *namedChannelWrapper) SignaledAt() time.Time { | ||
value := e.signaledAt.Load() | ||
if value == nil { | ||
return time.Time{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why zero value instead of simply nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to distinguish between nil and empty value here, but i made it SignaledAt() (time.Time, bool)
return | ||
} | ||
|
||
late := fmt.Sprintf("since=%s elapsed=%s threshold=%s", initiated.Name(), time.Since(initiated.SignaledAt()).String(), delayDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does delayDuration matter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the shutdown-delay-duration
server run option, if we want to know how late compared to the grace period it will be helpful. The audit mining tool does not need to fetch the configuration value from a different source.
staging/src/k8s.io/apiserver/pkg/server/filters/shutdown_annotations.go
Outdated
Show resolved
Hide resolved
return | ||
} | ||
|
||
late := fmt.Sprintf("since=%s elapsed=%s threshold=%s", initiated.Name(), time.Since(initiated.SignaledAt()).String(), delayDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is name identified by "since" in the message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it prints the name of the event which should be "ShutdownInitiated"
} | ||
|
||
late := fmt.Sprintf("since=%s elapsed=%s threshold=%s", initiated.Name(), time.Since(initiated.SignaledAt()).String(), delayDuration) | ||
audit.AddAuditAnnotation(req.Context(), "shutdown.apiserver.k8s.io/late", late) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to have an openshift name to start since this name is unlikely to survive contact with review and you'll have to react either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean go with an openshift specific name now since the proposed name will change in upstream anyway? So is having an openshift name required here?
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
The scenario we are trying to cover here is to know if any of the LBs doesn't honor apiserver readiness and this is something that can already be measured with the metrics we have from client-go and the apiserver. We can already gather the rate of requests made to the various load balancers. An example of that can be found here: Also, since Kubernetes 1.26, we have metrics about apiserver healthchecks with https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3466-kubernetes-component-health-slis. We can for example get:
So all the information we need should already be available to us and we should just need to correlate them to achieve our goal. Also, one thing to note is that this is a very intrusive carry-patch that is very likely to introduce merge conflicts in the future. So I think we should take our time to evaluate what it brings that we don't have today and assess whether it justifies the potential time-loss during rebase. |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@@ -188,3 +204,11 @@ func (e *namedChannelWrapper) Signaled() <-chan struct{} { | |||
func (e *namedChannelWrapper) Name() string { | |||
return e.name | |||
} | |||
|
|||
func (e *namedChannelWrapper) SignaledAt() *time.Time { | |||
value := e.signaledAt.Load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bit surprising if SignaledAt returns nil if the signal hasn't fire yet but the converse is not also true. If you ran something like:
<-Signaled()
ts := SignaledAt()
then you could end up with ts
being nil, couldn't you?
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@tkashem: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benluddy, dinhxuanvu, tkashem The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-4.13 |
@tkashem: once the present PR merges, I will cherry-pick it on top of release-4.13 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
/remove-label backports/unvalidated-commits |
/retest-required |
@tkashem: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@tkashem: Jira Issue OCPBUGS-10048: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-10048 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@tkashem: new pull request created: #1517 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: