Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/redis] Sentinel helm upgrades cause CrashLoopBackOff for last slave #14766

Closed
@mdeggies

Description

Describe the bug
A clear and concise description of what the bug is.

When we upgrade anything in our sentinel chart using helm upgrade, the last slave's pod becomes unhealthy. In the below repro example, I do a deployment and then upgrade the deployment by adding resource/request limits. This causes the following error: Back-off 5m0s restarting failed container=sentinel pod=michele-test-redis-slave-2_default(d9383410-8d49-11e9-b7f3-42010ab404ba): CrashLoopBackOff.

Version of Helm and Kubernetes:

$ helm version Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

K8s cluster version: 1.12.6-gke.11

Which chart:

stable/redis

What happened:

The last slave's sentinel container becomes unhealthy, causing the pod to enter CrashLoopBackOff mode instead of Running mode.

What you expected to happen:

Upgrading the chart via helm upgrade should do a seamless upgrade.

How to reproduce it (as minimally and precisely as possible):

  • Copy the following into chart.yaml:
cluster:
  slaveCount: 3

password: test

metrics:
  enabled: false

image:
  tag: 5.0.5-debian-9-r14

sentinel:
  enabled: true
  masterSet: "michele-test-redis-master"
  image:
    tag: 5.0.5-debian-9-r14
  # resources:
  #   requests:
  #     memory: 128Mi
  #     cpu: 100m
  #   limits:
  #     memory: 128Mi

master:
  persistence:
    enabled: false
  nodeSelector:
    service: "ads-shared-tools"
  tolerations:
    - key: "service"
      operator: "Equal"
      value: "ads-shared-tools"
      effect: "NoSchedule"

slave:
  persistence:
    enabled: false
  nodeSelector:
    service: "ads-shared-tools"
  tolerations:
    - key: "service"
      operator: "Equal"
      value: "ads-shared-tools"
      effect: "NoSchedule"
  • Run helm install --name michele-test -f chart.yaml stable/redis

  • See that everything works fine. All pods are running, etc

  • Edit chart.yaml with the following, noting that the only difference between these charts is the addition of resource/request limits:

cluster:
  slaveCount: 3

password: test

metrics:
  enabled: false

image:
  tag: 5.0.5-debian-9-r14

sentinel:
  enabled: true
  masterSet: "michele-test-redis-master"
  image:
    tag: 5.0.5-debian-9-r14
  resources:
    requests:
      memory: 128Mi
      cpu: 100m
    limits:
      memory: 128Mi

master:
  persistence:
    enabled: false
  nodeSelector:
    service: "ads-shared-tools"
  tolerations:
    - key: "service"
      operator: "Equal"
      value: "ads-shared-tools"
      effect: "NoSchedule"

slave:
  persistence:
    enabled: false
  nodeSelector:
    service: "ads-shared-tools"
  tolerations:
    - key: "service"
      operator: "Equal"
      value: "ads-shared-tools"
      effect: "NoSchedule"
  • Upgrade the chart with helm upgrade michele-test -f chart.yaml stable/redis to break things

  • See that the last slave's sentinel container is in CrashLoopBackOff mode (pics below show the service and pod affected):

Pod
Service

Anything else we need to know:

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions