Skip to content

CatalogSource pods crashing when under Istio #2343

Open
@sathieu

Description

@sathieu

Bug Report

What did you do?

I had to disable Istio injection to ensure it works:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: operatorhubio-catalog
  namespace: operator-lifecycle-manager
  annotations:
    sidecar.istio.io/inject: 'false' # <===== HERE
spec:
  sourceType: grpc
  image: quay.io/operatorhubio/catalog:latest
  displayName: Community Operators
  publisher: OperatorHub.io
  updateStrategy:
    registryPoll:
      interval: 60m

Otherwise, pods are always crashing:

NAME                               READY   STATUS        RESTARTS   AGE
catalog-operator-699bcff75-chl9c   2/2     Running       0          30h
olm-operator-6fb9698bbc-k4gv7      2/2     Running       0          30h
operatorhubio-catalog-8lv2h        0/2     Terminating   0          4s
operatorhubio-catalog-mt2j7        0/2     Pending       0          <invalid>
operatorhubio-catalog-ps8pt        0/2     Terminating   0          0s
packageserver-76bfd988c7-dvphf     2/2     Running       0          30h
packageserver-76bfd988c7-n9mrj     2/2     Running       0          30h

What did you expect to see?

No crash, even with Istio injection.

What did you see instead? Under which circumstances?

Pods crashing.

Here is a crashing Pod spec:
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/podIP: 10.233.81.103/32
    cni.projectcalico.org/podIPs: 10.233.81.103/32
    k8s.v1.cni.cncf.io/networks: istio-cni
    kubectl.kubernetes.io/default-container: registry-server
    kubectl.kubernetes.io/default-logs-container: registry-server
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","metadata":{"annotations":{},"labels":{"argocd.argoproj.io/instance":"operator-lifecycle-manager"},"name":"operatorhubio-catalog","namespace":"operator-lifecycle-manager"},"spec":{"displayName":"Community Operators","image":"quay.io/operatorhubio/catalog:latest","publisher":"OperatorHub.io","sourceType":"grpc","updateStrategy":{"registryPoll":{"interval":"60m"}}}}
    kubernetes.io/psp: privileged
    sidecar.istio.io/interceptionMode: REDIRECT
    sidecar.istio.io/status: '{"initContainers":["istio-validation"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
    traffic.sidecar.istio.io/excludeInboundPorts: "15020"
    traffic.sidecar.istio.io/includeInboundPorts: '*'
    traffic.sidecar.istio.io/includeOutboundIPRanges: '*'
  creationTimestamp: "2021-09-08T14:12:35Z"
  deletionGracePeriodSeconds: 1
  deletionTimestamp: "2021-09-08T14:12:37Z"
  generateName: operatorhubio-catalog-
  labels:
    olm.catalogSource: operatorhubio-catalog
    olm.pod-spec-hash: 655cfc9c56
    security.istio.io/tlsMode: istio
    service.istio.io/canonical-name: ""
    service.istio.io/canonical-revision: latest
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:olm.catalogSource: {}
          f:olm.pod-spec-hash: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"89f01719-008a-4197-af27-ac8b9f2d580e"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:containers:
          k:{"name":"registry-server"}:
            .: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:livenessProbe:
              .: {}
              f:exec:
                .: {}
                f:command: {}
              f:failureThreshold: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":50051,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
            f:readinessProbe:
              .: {}
              f:exec:
                .: {}
                f:command: {}
              f:failureThreshold: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:resources:
              .: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:securityContext:
              .: {}
              f:readOnlyRootFilesystem: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:nodeSelector:
          .: {}
          f:kubernetes.io/os: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext:
          .: {}
          f:fsGroup: {}
        f:serviceAccount: {}
        f:serviceAccountName: {}
        f:terminationGracePeriodSeconds: {}
    manager: catalog
    operation: Update
    time: "2021-09-08T14:12:35Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cni.projectcalico.org/podIP: {}
          f:cni.projectcalico.org/podIPs: {}
    manager: calico
    operation: Update
    time: "2021-09-08T14:12:36Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:initContainerStatuses: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-09-08T14:12:36Z"
  name: operatorhubio-catalog-jhtb2
  namespace: operator-lifecycle-manager
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: CatalogSource
    name: operatorhubio-catalog
    uid: 89f01719-008a-4197-af27-ac8b9f2d580e
  resourceVersion: "4035852"
  uid: 18a1db35-c497-4778-810c-0ef5bf2a31d9
spec:
  containers:
  - args:
    - proxy
    - sidecar
    - --domain
    - $(POD_NAMESPACE).svc.cluster.local
    - --proxyLogLevel=warning
    - --proxyComponentLogLevel=misc:error
    - --log_output_level=default:info
    - --concurrency
    - "2"
    env:
    - name: JWT_POLICY
      value: third-party-jwt
    - name: PILOT_CERT_PROVIDER
      value: istiod
    - name: CA_ADDR
      value: istiod.istio-system.svc:15012
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: INSTANCE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: SERVICE_ACCOUNT
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.serviceAccountName
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: PROXY_CONFIG
      value: |
        {"holdApplicationUntilProxyStarts":true}
    - name: ISTIO_META_POD_PORTS
      value: |-
        [
            {"name":"grpc","containerPort":50051,"protocol":"TCP"}
        ]
    - name: ISTIO_META_APP_CONTAINERS
      value: registry-server
    - name: ISTIO_META_CLUSTER_ID
      value: Kubernetes
    - name: ISTIO_META_INTERCEPTION_MODE
      value: REDIRECT
    - name: ISTIO_META_MESH_ID
      value: cluster.local
    - name: TRUST_DOMAIN
      value: cluster.local
    image: docker.io/istio/proxyv2:1.11.2
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - pilot-agent
          - wait
    name: istio-proxy
    ports:
    - containerPort: 15090
      name: http-envoy-prom
      protocol: TCP
    readinessProbe:
      failureThreshold: 30
      httpGet:
        path: /healthz/ready
        port: 15021
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 3
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 1337
      runAsNonRoot: true
      runAsUser: 1337
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/istio
      name: istiod-ca-cert
    - mountPath: /var/lib/istio/data
      name: istio-data
    - mountPath: /etc/istio/proxy
      name: istio-envoy
    - mountPath: /var/run/secrets/tokens
      name: istio-token
    - mountPath: /etc/istio/pod
      name: istio-podinfo
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: operatorhubio-catalog-token-pvq4s
      readOnly: true
  - image: quay.io/operatorhubio/catalog:latest
    imagePullPolicy: Always
    livenessProbe:
      exec:
        command:
        - grpc_health_probe
        - -addr=:50051
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    name: registry-server
    ports:
    - containerPort: 50051
      name: grpc
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - grpc_health_probe
        - -addr=:50051
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
    securityContext:
      readOnlyRootFilesystem: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: operatorhubio-catalog-token-pvq4s
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - istio-iptables
    - -p
    - "15001"
    - -z
    - "15006"
    - -u
    - "1337"
    - -m
    - REDIRECT
    - -i
    - '*'
    - -x
    - ""
    - -b
    - '*'
    - -d
    - 15090,15021,15020
    - --run-validation
    - --skip-rule-apply
    image: docker.io/istio/proxyv2:1.11.2
    imagePullPolicy: IfNotPresent
    name: istio-validation
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 1337
      runAsNonRoot: true
      runAsUser: 1337
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: operatorhubio-catalog-token-pvq4s
      readOnly: true
  nodeName: REDACTED
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1337
  serviceAccount: operatorhubio-catalog
  serviceAccountName: operatorhubio-catalog
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir:
      medium: Memory
    name: istio-envoy
  - emptyDir: {}
    name: istio-data
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels
        path: labels
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: istio-podinfo
  - name: istio-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: istio-ca
          expirationSeconds: 43200
          path: istio-token
  - configMap:
      defaultMode: 420
      name: istio-ca-root-cert
    name: istiod-ca-cert
  - name: operatorhubio-catalog-token-pvq4s
    secret:
      defaultMode: 420
      secretName: operatorhubio-catalog-token-pvq4s
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-09-08T14:12:27Z"
    message: 'containers with incomplete status: [istio-validation]'
    reason: ContainersNotInitialized
    status: "False"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-09-08T14:12:27Z"
    message: 'containers with unready status: [istio-proxy registry-server]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-09-08T14:12:27Z"
    message: 'containers with unready status: [istio-proxy registry-server]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-09-08T14:12:30Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: docker.io/istio/proxyv2:1.11.2
    imageID: ""
    lastState: {}
    name: istio-proxy
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: PodInitializing
  - image: quay.io/operatorhubio/catalog:latest
    imageID: ""
    lastState: {}
    name: registry-server
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: PodInitializing
  hostIP: REDACTED
  initContainerStatuses:
  - image: docker.io/istio/proxyv2:1.11.2
    imageID: ""
    lastState: {}
    name: istio-validation
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: PodInitializing
  phase: Pending
  qosClass: Burstable
  startTime: "2021-09-08T14:12:27Z"

Environment

  • operator-lifecycle-manager version: v0.18.3

  • Kubernetes version information:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:40:09Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Possible Solution

Additional context

I tried to debug this without luck. The pod is probably created from pkg/controller/registry/reconciler/grpc.go, but why is it crashing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.triagedIssue has been considered by a member of the OLM community

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions