Lazy upgrade of the Spilo image #859

sdudoladov · 2020-03-09T15:02:46Z

In certain situation (think node rotation) we know pods re-start for reason independent of the operator. When doing planned updates of the Spilo image, we piggyback on this observation and skip the rolling update after the image update in the statefulset. That save us some downtime.

sdudoladov · 2020-03-10T12:08:48Z

I welcome suggestions on how forcing the rolling upgrade should behave. The PR allows an administrator to force an update for the entire k8s cluster by changing the operator's configuration and restarting the operator pod. To support such behavior, the operator checks at every Sync that all pods run the image specified in their stateful set. When it is not the case, the operator triggers the rolling update. That solution comes with a cost of extra k8s API calls and unneeded switchovers for clusters that have at least one "old" replica pod.

Currently the operator triggers the rolling update based on the results of comparison the actual and desired statefulssets' Specs. So it will not automatically catch pods running old images when the lazy update is disable because that action alone leaves the statefulset Spec intact.

Things to consider:

We can implement syncPods method that would - as part of syncStatefulSet() - simply restart replicas with the old image and only do switchovers when the master also runs the old image.
Do we need a toggle to force a rolling update for individual PG clusters (i.e. a field in a PG manifest) ?

manifests/configmap.yaml

FxKu · 2020-03-25T16:02:23Z

pkg/cluster/sync.go

 			}
 		}
+
+		if !podsRollingUpdateRequired {


Suggested change

if !podsRollingUpdateRequired {

if c.OpConfig.EnableLazySpiloUpgrade && !podsRollingUpdateRequired {

I would move the check for lazy update here instead of doing it within mustUpdatePodsAfterLazyUpdate

I wonder if this part here is really necessary? When lazy update is active it would create another API call for every StatefulSet. Wouldn't the next sync fix outdated images?

I would move the check for lazy update here instead of doing it within mustUpdatePodsAfterLazyUpdate

Changed. Note the condition is !c.OpConfig.EnableLazySpiloUpgrade , that is, we only need to do it when lazy update is disabled.

Wouldn't the next sync fix outdated images?

No. Sync() relies on stateful set spec to figure out if rolling update is needed. With the lazy upgrade, stateful set itself will be up-to-date (pod template will contain the most recent image), but some of the pods may still be running old images just because they were not yet restarted. Operator will look in the stateful set spec, see the most recent image and happily skip the rolling upgrade for this cluster even though some pods run outdated images.

e2e/tests/test_e2e.py

This reverts commit 65fc090.

This reverts commit e2a8fa1.

sdudoladov · 2020-04-29T07:41:19Z

👍

FxKu · 2020-04-29T08:01:44Z

👍

Jan-M · 2020-06-09T12:10:15Z

pkg/cluster/cluster.go

+	// until they are re-created for other reasons, for example node rotation
+	if c.OpConfig.EnableLazySpiloUpgrade && !reflect.DeepEqual(c.Statefulset.Spec.Template.Spec.Containers[0].Image, statefulSet.Spec.Template.Spec.Containers[0].Image) {
+		needsReplace = true
+		needsRollUpdate = false


needsRollUpdate = !c.OpConfig.EnableLazySpiloUpgrade would also have been enough here.

The two codes parts just for the error message I am not sure are great. But done is done.

initial implementation

e9486f8

sdudoladov added the zalando label Mar 9, 2020

sdudoladov added this to the 1.5 milestone Mar 9, 2020

sdudoladov requested review from FxKu, Jan-M and erthalion March 9, 2020 15:02

sdudoladov requested review from CyberDem0n, RafiaSabih and avaczi as code owners March 9, 2020 15:02

sdudoladov self-assigned this Mar 9, 2020

Sergey Dudoladov added 2 commits March 10, 2020 12:17

merge commit

dd10127

describe forcing the rolling upgrade

507c793

make parameter name more descriptive

f782181

FxKu reviewed Mar 16, 2020

View reviewed changes

manifests/configmap.yaml Outdated Show resolved Hide resolved

FxKu and others added 2 commits March 17, 2020 11:43

add missing pieces

9d8c199

address review

d1830f3

sdudoladov changed the title ~~[WIP] Lazy upgrade of the Spilo image~~ Lazy upgrade of the Spilo image Mar 20, 2020

FxKu reviewed Mar 25, 2020

View reviewed changes

Merge branch 'master' into lazy-update

8f88fe7

FxKu reviewed Mar 27, 2020

View reviewed changes

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

FxKu reviewed Mar 27, 2020

View reviewed changes

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

FxKu reviewed Mar 27, 2020

View reviewed changes

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

FxKu reviewed Mar 27, 2020

View reviewed changes

e2e/tests/test_e2e.py Show resolved Hide resolved

Sergey Dudoladov added 2 commits March 31, 2020 14:38

Merge branch 'master' into lazy-update

901901f

address review

617c2a2

sdudoladov added the needs review label Mar 31, 2020

Sergey Dudoladov and others added 3 commits April 8, 2020 06:22

Merge branch 'master' into lazy-update

19043a4

fix bug in e2e tests

aa12d6c

fix cluster name label in e2e test

c48128c

FxKu and others added 23 commits April 17, 2020 09:23

raise test timeout

5b628e7

load spilo test image

3548297

use available spilo image

2ae6d34

delete replica pod for lazy update test

7a1f16f

fix e2e

eb8e8c3

fix e2e with a vengeance

356bc5f

lets wait for another 30m

5b18d23

print pod name in error msg

77211e7

print pod name in error msg 2

487ebbb

raise timeout, comment other tests

2c194e7

subsequent updates of config

04a6764

add comma

632d50c

Merge branch 'master' into lazy-update

f6804fd

fix e2e test

0aee1e3

resolve conflicts

9d64f6d

Merge branch 'master' into lazy-update

d965854

run unit tests before e2e

496a5bf

remove conflicting dependency

65fc090

merge with master and update image in e2e test

13022fd

Revert "remove conflicting dependency"

93ee2ce

This reverts commit 65fc090.

improve cdp build

e2a8fa1

dont run unit before e2e tests

ac56e74

Revert "improve cdp build"

57deeef

This reverts commit e2a8fa1.

sdudoladov added enhancement and removed needs review labels Apr 29, 2020

FxKu merged commit cc635a0 into master Apr 29, 2020

Jan-M reviewed Jun 9, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lazy upgrade of the Spilo image #859

Lazy upgrade of the Spilo image #859

Uh oh!

sdudoladov commented Mar 9, 2020 •

edited

Loading

Uh oh!

sdudoladov commented Mar 10, 2020 •

edited

Loading

Uh oh!

Uh oh!

FxKu Mar 25, 2020 •

edited

Loading

Uh oh!

FxKu Mar 25, 2020

Uh oh!

sdudoladov Mar 31, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sdudoladov commented Apr 29, 2020

Uh oh!

FxKu commented Apr 29, 2020

Uh oh!

Jan-M Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if !podsRollingUpdateRequired {
	if c.OpConfig.EnableLazySpiloUpgrade && !podsRollingUpdateRequired {

Lazy upgrade of the Spilo image #859

Lazy upgrade of the Spilo image #859

Uh oh!

Conversation

sdudoladov commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdudoladov commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

FxKu Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FxKu Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

sdudoladov Mar 31, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sdudoladov commented Apr 29, 2020

Uh oh!

FxKu commented Apr 29, 2020

Uh oh!

Jan-M Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sdudoladov commented Mar 9, 2020 •

edited

Loading

sdudoladov commented Mar 10, 2020 •

edited

Loading

FxKu Mar 25, 2020 •

edited

Loading