Skip to content

Commit

Permalink
KEP-2539: Addressing comments from #2540
Browse files Browse the repository at this point in the history
  • Loading branch information
chaodaiG committed Mar 3, 2021
1 parent fd30cf6 commit a133403
Showing 1 changed file with 39 additions and 25 deletions.
64 changes: 39 additions & 25 deletions keps/sig-testing/2539-continuously-deploy-k8s-prow/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
# KEP-2539: Continuously Deploy K8s Prow

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Prow Users](#prow-users)
- [Prow Oncall](#prow-oncall)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Breaking Changes in Prow](#breaking-changes-in-prow)
- [Design Details](#design-details)
- [Automated Merging of Prow Autobump PRs](#automated-merging-of-prow-autobump-prs)
- [Roll Back Process](#roll-back-process)
- [Implementation History](#implementation-history)
- [Alternatives](#alternatives)
- [A new tool merges autobump PRs](#a-new-tool-merges-autobump-prs)
- [Pros:](#pros)
- [Cons:](#cons)
- [KEP-2539: Continuously Deploy K8s Prow](#kep-2539-continuously-deploy-k8s-prow)
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Prow Users](#prow-users)
- [Prow Oncall](#prow-oncall)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Breaking Changes in Prow](#breaking-changes-in-prow)
- [Design Details](#design-details)
- [Automated Merging of Prow Autobump PRs](#automated-merging-of-prow-autobump-prs)
- [Roll Back Process](#roll-back-process)
- [Implementation History](#implementation-history)
- [Alternatives](#alternatives)
- [A new tool merges autobump PRs](#a-new-tool-merges-autobump-prs)
- [Pros:](#pros)
- [Cons:](#cons)
<!-- /toc -->

## Release Signoff Checklist
Expand Down Expand Up @@ -82,7 +83,7 @@ Shouldn’t see any change, prow breakage should be discovered by prow monitorin
- What’s Not Changed
- React to prow alerts and take actions.
- What’s Changed
- No more manual inspecting prow healthiness.
- Decouple prow logs inspection from prow bump.
- No more manual lgtm/approve/retest autobump PRs.
- No more manual Slack posting.

Expand All @@ -94,7 +95,7 @@ Change how prow is released.

## Proposal

Prow autobump PRs are automatically merged every hour, only on working hours of working days.
Prow autobump PRs are automatically merged every 3 hours, only on working hours of working days.

### Notes/Constraints/Caveats (Optional)

Expand Down Expand Up @@ -125,7 +126,10 @@ Suggestion: how to keep slack reports on each automated bump.
When prow stopped functioning after a bump, prow oncall should:
- Stop auto-deploying by commenting `/hold` on latest autobump PR.
- Manually create rollback PR for rolling back to known good version.
- Manually apply the changes from rollback PR.
- Prow is not super actively developed currently, normally there are not many
changes between bumps, and it should be easy to identify culprit.
- General rule of thumb is we can assume last bump was good.
- Manually apply the changes from rollback PR by running [`prow/bump.sh`](https://github.com/kubernetes/test-infra/blob/master/prow/deploy.sh)

```
<<[UNRESOLVED]>>
Expand All @@ -138,12 +142,22 @@ Which version to roll back. This is generally not a problem due to low release v

## Alternatives


#### A new tool merges autobump PRs
This method is independent of tide, which makes sure it works on every prow instance.

Instead of letting tide merge PR, an alternative idea is to created a dedicated
continuous deploy job that takes full control:
- Merge autobump PR on a fixed schedule

##### Pros:
Not relying on tide, works really well with prow instances that don't have tide.
- This method is independent of tide, which makes sure it works on every prow instance.

##### Cons:
Probably have significantly divergent code paths for finding and approving PRs on Gerrit vs PRs on GitHub.
- The tools is pretty similar to tide, means there will be lots of duplicated
logic with tide.

The biggest pros of this approach, is that it works better with prow instance
that doesn't have tide support yet, for example prow that works with gerrit.
However, there are two reasons for not going this path:
- The current design is targeting k8s prow, which does have tide.
- Tide will eventually come to gerrit and this can be evaluated later which
should be done first: tide for gerrit, or continuous deploy prow with gerrit.

0 comments on commit a133403

Please sign in to comment.