Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-10048: UPSTREAM: 115328: apiserver: annotate early (server not ready) and late (during shutdown) requests #1456

Merged
merged 2 commits into from
Mar 17, 2023

Conversation

tkashem
Copy link

@tkashem tkashem commented Jan 26, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Jan 26, 2023
@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 26, 2023
@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@tkashem tkashem changed the title apiserver: add audit annotations to incoming requests during shutdown UPSTREAM: 115328: apiserver: add audit annotations to incoming requests during shutdown Jan 26, 2023
@deads2k
Copy link

deads2k commented Jan 26, 2023

At a high-level, why not put this in the same spot as the existing audit log annotation adding filter so they are together. One for "termination has started" and one for "still getting requests really late" side-by-side?

it's down in staging/src/k8s.io/apiserver/pkg/server/patch_genericapiserver.go

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Feb 14, 2023
@tkashem
Copy link
Author

tkashem commented Feb 14, 2023

At a high-level, why not put this in the same spot as the existing audit log annotation adding filter so they are together.

@deads2k I would like to push this upstream, so keeping it separate for now. My goal is to move the bits and pieces we need from staging/src/k8s.io/apiserver/pkg/server/patch_genericapiserver.go into here.


// Signaled returns a channel that is closed when the underlying event
// has been signaled. Successive calls to Signaled return the same value.
Signaled() <-chan struct{}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I a name that tell me what is being signaled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not unless we wrap it and give it another name, we have multiple signals
https://github.com/kubernetes/kubernetes/blob/125e38c0872521bec49351415cc9fb624b046659/staging/src/k8s.io/apiserver/pkg/server/lifecycle_signals.go#L112-L147

Name actually returns the name of the signal which i am including in the annotation

func (e *namedChannelWrapper) SignaledAt() time.Time {
value := e.signaledAt.Load()
if value == nil {
return time.Time{}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why zero value instead of simply nil?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to distinguish between nil and empty value here, but i made it SignaledAt() (time.Time, bool)

return
}

late := fmt.Sprintf("since=%s elapsed=%s threshold=%s", initiated.Name(), time.Since(initiated.SignaledAt()).String(), delayDuration)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does delayDuration matter here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the shutdown-delay-duration server run option, if we want to know how late compared to the grace period it will be helpful. The audit mining tool does not need to fetch the configuration value from a different source.

return
}

late := fmt.Sprintf("since=%s elapsed=%s threshold=%s", initiated.Name(), time.Since(initiated.SignaledAt()).String(), delayDuration)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is name identified by "since" in the message?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it prints the name of the event which should be "ShutdownInitiated"

}

late := fmt.Sprintf("since=%s elapsed=%s threshold=%s", initiated.Name(), time.Since(initiated.SignaledAt()).String(), delayDuration)
audit.AddAuditAnnotation(req.Context(), "shutdown.apiserver.k8s.io/late", late)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to have an openshift name to start since this name is unlikely to survive contact with review and you'll have to react either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean go with an openshift specific name now since the proposed name will change in upstream anyway? So is having an openshift name required here?

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@dgrisonnet
Copy link
Member

dgrisonnet commented Feb 16, 2023

The scenario we are trying to cover here is to know if any of the LBs doesn't honor apiserver readiness and this is something that can already be measured with the metrics we have from client-go and the apiserver.

We can already gather the rate of requests made to the various load balancers. An example of that can be found here:
https://github.com/openshift/cluster-kube-apiserver-operator/pull/1272/files#diff-8d2c027b0440a540f5ffa6a2db4ec95d9a5f6f3ce36874ea3242224f47df7575R13-R30

Also, since Kubernetes 1.26, we have metrics about apiserver healthchecks with https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3466-kubernetes-component-health-slis.

We can for example get:

kubernetes_healthcheck{name="shutdown",type="readyz"} 1
kubernetes_healthchecks_total{name="shutdown",status="success",type="readyz"} 15

So all the information we need should already be available to us and we should just need to correlate them to achieve our goal.

Also, one thing to note is that this is a very intrusive carry-patch that is very likely to introduce merge conflicts in the future. So I think we should take our time to evaluate what it brings that we don't have today and assess whether it justifies the potential time-loss during rebase.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2023
@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@@ -188,3 +204,11 @@ func (e *namedChannelWrapper) Signaled() <-chan struct{} {
func (e *namedChannelWrapper) Name() string {
return e.name
}

func (e *namedChannelWrapper) SignaledAt() *time.Time {
value := e.signaledAt.Load()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bit surprising if SignaledAt returns nil if the signal hasn't fire yet but the converse is not also true. If you ran something like:

<-Signaled()
ts := SignaledAt()

then you could end up with ts being nil, couldn't you?

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@benluddy
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 17, 2023
@openshift-ci
Copy link

openshift-ci bot commented Mar 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benluddy, dinhxuanvu, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tkashem
Copy link
Author

tkashem commented Mar 17, 2023

/cherry-pick release-4.13

@openshift-cherrypick-robot

@tkashem: once the present PR merges, I will cherry-pick it on top of release-4.13 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tkashem
Copy link
Author

tkashem commented Mar 17, 2023

/retest-required

@tkashem
Copy link
Author

tkashem commented Mar 17, 2023

/remove-label backports/unvalidated-commits
/label backports/validated-commits

@openshift-ci openshift-ci bot added backports/validated-commits Indicates that all commits come to merged upstream PRs. and removed backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Mar 17, 2023
@tkashem
Copy link
Author

tkashem commented Mar 17, 2023

/retest-required

@openshift-ci
Copy link

openshift-ci bot commented Mar 17, 2023

@tkashem: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade 20caad9 link false /test 4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade
ci/prow/e2e-agnostic-ovn-cmd 20caad9 link false /test e2e-agnostic-ovn-cmd

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 54b5520 into openshift:master Mar 17, 2023
@openshift-ci-robot
Copy link

@tkashem: Jira Issue OCPBUGS-10048: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-10048 has been moved to the MODIFIED state.

In response to this:

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@tkashem: new pull request created: #1517

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backports/validated-commits Indicates that all commits come to merged upstream PRs. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants