From a133403ac0eb90d40a0b768b2ce2b8adf2dafe5c Mon Sep 17 00:00:00 2001 From: Chao Dai Date: Wed, 3 Mar 2021 15:36:19 -0800 Subject: [PATCH] KEP-2539: Addressing comments from #2540 --- .../README.md | 64 +++++++++++-------- 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/keps/sig-testing/2539-continuously-deploy-k8s-prow/README.md b/keps/sig-testing/2539-continuously-deploy-k8s-prow/README.md index a7400de94f2d..65dc062730f0 100644 --- a/keps/sig-testing/2539-continuously-deploy-k8s-prow/README.md +++ b/keps/sig-testing/2539-continuously-deploy-k8s-prow/README.md @@ -1,24 +1,25 @@ # KEP-2539: Continuously Deploy K8s Prow -- [Release Signoff Checklist](#release-signoff-checklist) -- [Summary](#summary) -- [Motivation](#motivation) - - [Goals](#goals) - - [Prow Users](#prow-users) - - [Prow Oncall](#prow-oncall) - - [Non-Goals](#non-goals) -- [Proposal](#proposal) - - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - - [Breaking Changes in Prow](#breaking-changes-in-prow) -- [Design Details](#design-details) - - [Automated Merging of Prow Autobump PRs](#automated-merging-of-prow-autobump-prs) - - [Roll Back Process](#roll-back-process) -- [Implementation History](#implementation-history) -- [Alternatives](#alternatives) - - [A new tool merges autobump PRs](#a-new-tool-merges-autobump-prs) - - [Pros:](#pros) - - [Cons:](#cons) +- [KEP-2539: Continuously Deploy K8s Prow](#kep-2539-continuously-deploy-k8s-prow) + - [Release Signoff Checklist](#release-signoff-checklist) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Prow Users](#prow-users) + - [Prow Oncall](#prow-oncall) + - [Non-Goals](#non-goals) + - [Proposal](#proposal) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Breaking Changes in Prow](#breaking-changes-in-prow) + - [Design Details](#design-details) + - [Automated Merging of Prow Autobump PRs](#automated-merging-of-prow-autobump-prs) + - [Roll Back Process](#roll-back-process) + - [Implementation History](#implementation-history) + - [Alternatives](#alternatives) + - [A new tool merges autobump PRs](#a-new-tool-merges-autobump-prs) + - [Pros:](#pros) + - [Cons:](#cons) ## Release Signoff Checklist @@ -82,7 +83,7 @@ Shouldn’t see any change, prow breakage should be discovered by prow monitorin - What’s Not Changed - React to prow alerts and take actions. - What’s Changed - - No more manual inspecting prow healthiness. + - Decouple prow logs inspection from prow bump. - No more manual lgtm/approve/retest autobump PRs. - No more manual Slack posting. @@ -94,7 +95,7 @@ Change how prow is released. ## Proposal -Prow autobump PRs are automatically merged every hour, only on working hours of working days. +Prow autobump PRs are automatically merged every 3 hours, only on working hours of working days. ### Notes/Constraints/Caveats (Optional) @@ -125,7 +126,10 @@ Suggestion: how to keep slack reports on each automated bump. When prow stopped functioning after a bump, prow oncall should: - Stop auto-deploying by commenting `/hold` on latest autobump PR. - Manually create rollback PR for rolling back to known good version. -- Manually apply the changes from rollback PR. + - Prow is not super actively developed currently, normally there are not many + changes between bumps, and it should be easy to identify culprit. + - General rule of thumb is we can assume last bump was good. +- Manually apply the changes from rollback PR by running [`prow/bump.sh`](https://github.com/kubernetes/test-infra/blob/master/prow/deploy.sh) ``` <<[UNRESOLVED]>> @@ -138,12 +142,22 @@ Which version to roll back. This is generally not a problem due to low release v ## Alternatives - #### A new tool merges autobump PRs -This method is independent of tide, which makes sure it works on every prow instance. + +Instead of letting tide merge PR, an alternative idea is to created a dedicated +continuous deploy job that takes full control: +- Merge autobump PR on a fixed schedule ##### Pros: -Not relying on tide, works really well with prow instances that don't have tide. +- This method is independent of tide, which makes sure it works on every prow instance. ##### Cons: -Probably have significantly divergent code paths for finding and approving PRs on Gerrit vs PRs on GitHub. +- The tools is pretty similar to tide, means there will be lots of duplicated + logic with tide. + +The biggest pros of this approach, is that it works better with prow instance +that doesn't have tide support yet, for example prow that works with gerrit. +However, there are two reasons for not going this path: +- The current design is targeting k8s prow, which does have tide. +- Tide will eventually come to gerrit and this can be evaluated later which + should be done first: tide for gerrit, or continuous deploy prow with gerrit.