Skip to content

Conversation

@sdudoladov
Copy link
Member

@sdudoladov sdudoladov commented Mar 9, 2020

In certain situation (think node rotation) we know pods re-start for reason independent of the operator. When doing planned updates of the Spilo image, we piggyback on this observation and skip the rolling update after the image update in the statefulset. That save us some downtime.

@sdudoladov sdudoladov added this to the 1.5 milestone Mar 9, 2020
@sdudoladov sdudoladov requested review from FxKu, Jan-M and erthalion March 9, 2020 15:02
@sdudoladov sdudoladov self-assigned this Mar 9, 2020
@sdudoladov
Copy link
Member Author

sdudoladov commented Mar 10, 2020

I welcome suggestions on how forcing the rolling upgrade should behave. The PR allows an administrator to force an update for the entire k8s cluster by changing the operator's configuration and restarting the operator pod. To support such behavior, the operator checks at every Sync that all pods run the image specified in their stateful set. When it is not the case, the operator triggers the rolling update. That solution comes with a cost of extra k8s API calls and unneeded switchovers for clusters that have at least one "old" replica pod.

Currently the operator triggers the rolling update based on the results of comparison the actual and desired statefulssets' Specs. So it will not automatically catch pods running old images when the lazy update is disable because that action alone leaves the statefulset Spec intact.

Things to consider:

  1. We can implement syncPods method that would - as part of syncStatefulSet() - simply restart replicas with the old image and only do switchovers when the master also runs the old image.
  2. Do we need a toggle to force a rolling update for individual PG clusters (i.e. a field in a PG manifest) ?

@sdudoladov sdudoladov changed the title [WIP] Lazy upgrade of the Spilo image Lazy upgrade of the Spilo image Mar 20, 2020
}
}

if !podsRollingUpdateRequired {
Copy link
Member

@FxKu FxKu Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !podsRollingUpdateRequired {
if c.OpConfig.EnableLazySpiloUpgrade && !podsRollingUpdateRequired {

I would move the check for lazy update here instead of doing it within mustUpdatePodsAfterLazyUpdate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this part here is really necessary? When lazy update is active it would create another API call for every StatefulSet. Wouldn't the next sync fix outdated images?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the check for lazy update here instead of doing it within mustUpdatePodsAfterLazyUpdate

Changed. Note the condition is !c.OpConfig.EnableLazySpiloUpgrade , that is, we only need to do it when lazy update is disabled.

Wouldn't the next sync fix outdated images?

No. Sync() relies on stateful set spec to figure out if rolling update is needed. With the lazy upgrade, stateful set itself will be up-to-date (pod template will contain the most recent image), but some of the pods may still be running old images just because they were not yet restarted. Operator will look in the stateful set spec, see the most recent image and happily skip the rolling upgrade for this cluster even though some pods run outdated images.

@sdudoladov
Copy link
Member Author

👍

1 similar comment
@FxKu
Copy link
Member

FxKu commented Apr 29, 2020

👍

@FxKu FxKu merged commit cc635a0 into master Apr 29, 2020
// until they are re-created for other reasons, for example node rotation
if c.OpConfig.EnableLazySpiloUpgrade && !reflect.DeepEqual(c.Statefulset.Spec.Template.Spec.Containers[0].Image, statefulSet.Spec.Template.Spec.Containers[0].Image) {
needsReplace = true
needsRollUpdate = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needsRollUpdate = !c.OpConfig.EnableLazySpiloUpgrade would also have been enough here.

The two codes parts just for the error message I am not sure are great. But done is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants