[stable/redis] Sentinel helm upgrades cause CrashLoopBackOff for last slave #14766
Description
Describe the bug
A clear and concise description of what the bug is.
When we upgrade anything in our sentinel chart using helm upgrade
, the last slave's pod becomes unhealthy. In the below repro example, I do a deployment and then upgrade the deployment by adding resource/request limits. This causes the following error: Back-off 5m0s restarting failed container=sentinel pod=michele-test-redis-slave-2_default(d9383410-8d49-11e9-b7f3-42010ab404ba): CrashLoopBackOff
.
Version of Helm and Kubernetes:
$ helm version Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
K8s cluster version: 1.12.6-gke.11
Which chart:
stable/redis
What happened:
The last slave's sentinel container becomes unhealthy, causing the pod to enter CrashLoopBackOff
mode instead of Running
mode.
What you expected to happen:
Upgrading the chart via helm upgrade
should do a seamless upgrade.
How to reproduce it (as minimally and precisely as possible):
- Copy the following into
chart.yaml
:
cluster:
slaveCount: 3
password: test
metrics:
enabled: false
image:
tag: 5.0.5-debian-9-r14
sentinel:
enabled: true
masterSet: "michele-test-redis-master"
image:
tag: 5.0.5-debian-9-r14
# resources:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 128Mi
master:
persistence:
enabled: false
nodeSelector:
service: "ads-shared-tools"
tolerations:
- key: "service"
operator: "Equal"
value: "ads-shared-tools"
effect: "NoSchedule"
slave:
persistence:
enabled: false
nodeSelector:
service: "ads-shared-tools"
tolerations:
- key: "service"
operator: "Equal"
value: "ads-shared-tools"
effect: "NoSchedule"
-
Run
helm install --name michele-test -f chart.yaml stable/redis
-
See that everything works fine. All pods are running, etc
-
Edit
chart.yaml
with the following, noting that the only difference between these charts is the addition of resource/request limits:
cluster:
slaveCount: 3
password: test
metrics:
enabled: false
image:
tag: 5.0.5-debian-9-r14
sentinel:
enabled: true
masterSet: "michele-test-redis-master"
image:
tag: 5.0.5-debian-9-r14
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 128Mi
master:
persistence:
enabled: false
nodeSelector:
service: "ads-shared-tools"
tolerations:
- key: "service"
operator: "Equal"
value: "ads-shared-tools"
effect: "NoSchedule"
slave:
persistence:
enabled: false
nodeSelector:
service: "ads-shared-tools"
tolerations:
- key: "service"
operator: "Equal"
value: "ads-shared-tools"
effect: "NoSchedule"
-
Upgrade the chart with
helm upgrade michele-test -f chart.yaml stable/redis
to break things -
See that the last slave's sentinel container is in CrashLoopBackOff mode (pics below show the service and pod affected):
Anything else we need to know: