[release-1.2] Reduce the period and failure threshold for activator readiness #12618

knative-prow-robot · 2022-02-13T16:22:18Z

This is an automated cherry-pick of #12614

NONE

The default drain timeout is 45 seconds which was much shorter than the time it takes the activator to be recognized as not ready (2 minutes) This was resulting in 503s since the activator was receiving traffic when it was not expecting it

dprotaso · 2022-02-13T16:23:50Z

/lgtm
/approve

knative-prow-robot · 2022-02-13T16:23:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, knative-prow-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~config/OWNERS~~ [dprotaso]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2022-02-13T16:26:56Z

Codecov Report

Merging #12618 (8291ca1) into release-1.2 (ff30afc) will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff              @@
##           release-1.2   #12618   +/-   ##
============================================
  Coverage        87.48%   87.48%           
============================================
  Files              195      195           
  Lines             9718     9718           
============================================
  Hits              8502     8502           
  Misses             931      931           
  Partials           285      285

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff30afc...8291ca1. Read the comment docs.

dprotaso · 2022-02-13T16:54:59Z

/override pull-knative-serving-istio-latest-no-mesh

knative-prow-robot · 2022-02-13T16:55:01Z

@dprotaso: Overrode contexts on behalf of dprotaso: pull-knative-serving-istio-latest-no-mesh

In response to this:

/override pull-knative-serving-istio-latest-no-mesh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…ive#12618) The default drain timeout is 45 seconds which was much shorter than the time it takes the activator to be recognized as not ready (2 minutes) This was resulting in 503s since the activator was receiving traffic when it was not expecting it Co-authored-by: dprotaso <dprotaso@gmail.com>

* Pin to 1.23 S-O branch * Add 0-kourier.yaml and 1-config-network.yaml to kourier.yaml (#1122) * Rename kourier.yaml with 0-kourier.yaml * Concat the files * fix csv logic (#1125) * Reduce the period and failure threshold for activator readiness (knative#12618) The default drain timeout is 45 seconds which was much shorter than the time it takes the activator to be recognized as not ready (2 minutes) This was resulting in 503s since the activator was receiving traffic when it was not expecting it Co-authored-by: dprotaso <dprotaso@gmail.com> * Address 503s when the autoscaler is being rolled (knative#12621) The activator's readiness depends on the status of web socket connection to the autoscaler. When the connection is down the activator will report ready=false. This can occur when the autoscaler deployment is updating. PR knative#12614 made the activator's readiness probe fail aggressively after a single failure. This didn't seem to impact istio but with contour it started returning 503s since the activator started to report ready=false immediately. This PR does two things to mitigate 503s: - bump the readiness threshold to give the autoscaler more time to rollout/startup. This still remains lower than the drain duration - Update the autoscaler rollout strategy so we spin up a new instance prior to bring down the older one. This is done using maxUnavailable=0 Co-authored-by: dprotaso <dprotaso@gmail.com> * [release-1.2] Drop MaxDurationSeconds from the RevisionSpec (knative#12640) * Drop MaxDurationSeconds from the RevisionSpec (knative#12635) We added MaxDurationSeconds (knative#12322) because the behaviour of RevisionSpec.Timeout changed from total duration to time to first byte. In hindsight changing the behaviour of Timeout was a mistake since it goes against the original specification. Thus we're going to create a path for migration and the first part is to remove MaxDurationSeconds from the RevisionSpec. * fix conformance test * [release-1.2] fix ytt package name (knative#12657) * fix ytt package name * use correct path Co-authored-by: dprotaso <dprotaso@gmail.com> * Remove an unnecessary start delay when resolving tag to digests (knative#12669) Co-authored-by: dprotaso <dprotaso@gmail.com> * Drop collecting performance data in release branch (knative#12673) Co-authored-by: dprotaso <dprotaso@gmail.com> * bump ggcr which includes auth config lookup fixes for k8s (knative#12656) Includes the fixes: - google/go-containerregistry#1299 - google/go-containerregistry#1300 * Fixes an activator panic when the throttle encounters a cache.DeleteFinalStateUnknown (knative#12680) Co-authored-by: dprotaso <dprotaso@gmail.com> * upgrade to latest dependencies (knative#12674) bumping knative.dev/pkg 77555ea...083dd97: > 083dd97 Wait for reconciler/controllers to return prior to exiting the process (# 2438) > df430fa dizzy: we must use `flags` instead of `pflags`, since this is not working. It seems like pflag.* adds the var to its own flag set, not the one package flag uses, and it doesn't expose the internal flag.Var externally - hence this fix. (# 2415) Signed-off-by: Knative Automation <automation@knative.team> * [release-1.2] fix tag to digest resolution (ggcr bump) (knative#12834) * pin k8s dep * Fix tag to digest resolution with K8s secrets I forgot to bump ggcr's sub package in the prior release github.com/google/go-containerregistry/pkg/authn/k8schain * bump ggcr which fixes tag-to-digest resolution for Azure & GitLab (knative#12857) Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com> Co-authored-by: Knative Prow Robot <knative-prow-robot@google.com> Co-authored-by: dprotaso <dprotaso@gmail.com> Co-authored-by: knative-automation <automation@knative.team>

knative-prow-robot assigned dprotaso Feb 13, 2022

knative-prow-robot mentioned this pull request Feb 13, 2022

Reduce the period and failure threshold for activator readiness #12614

Merged

knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 13, 2022

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2022

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2022

dprotaso mentioned this pull request Feb 13, 2022

Address 503s when the autoscaler is being rolled #12617

Merged

knative-prow-robot merged commit bf906e8 into knative:release-1.2 Feb 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-1.2] Reduce the period and failure threshold for activator readiness #12618

[release-1.2] Reduce the period and failure threshold for activator readiness #12618

knative-prow-robot commented Feb 13, 2022

dprotaso commented Feb 13, 2022

knative-prow-robot commented Feb 13, 2022

codecov bot commented Feb 13, 2022 •

edited

Loading

dprotaso commented Feb 13, 2022

knative-prow-robot commented Feb 13, 2022

[release-1.2] Reduce the period and failure threshold for activator readiness #12618

[release-1.2] Reduce the period and failure threshold for activator readiness #12618

Conversation

knative-prow-robot commented Feb 13, 2022

dprotaso commented Feb 13, 2022

knative-prow-robot commented Feb 13, 2022

codecov bot commented Feb 13, 2022 • edited Loading

Codecov Report

dprotaso commented Feb 13, 2022

knative-prow-robot commented Feb 13, 2022

codecov bot commented Feb 13, 2022 •

edited

Loading