show pods in dashboard for easier debugging of reconcilation errors #4068

schdief · 2023-10-05T13:58:59Z

Problem
When a reconcilation fails, e. g. due to an imagepullbackoff due to a wrong image tag or missing pull secret, the dashboard is quite useless, as it only shows the deployment and not the pods (old and new).

Solution
The dashboard should also show all the pods and not just the deployment. For the pods it should show the whole yaml to see all the events.

Additional context
For a test deployment I have deliberately broken the image tag reference to get a ImagePullBackOff. So reconciliation fails and the old pod stays active.
Unfortunately Weave GitOps doesn’t tell me that story. It only tells me that reconciliation is in progress and something fails the health check, but in order to see the problem I need to connect to the cluster and use kubectl:

NAME READY STATUS RESTARTS AGE
pod/release-name-nodebrady-5978488bb8-m62gd 1/1 Running 0 11m
pod/release-name-nodebrady-c9897f486-6rmgn 0/1 ImagePullBackOff 0 5m4s

The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.
Clicking on the failing pod I would then also see the reason for the imagepullbackoff:

Warning Failed 17m (x4 over 18m) kubelet Failed to pull image "peter:pan": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/peter:pan": failed to resolve reference "docker.io/library/peter:pan": failed to do request: Head https://registry-1.docker.io/v2/library/peter/manifests/pan: x509: certificate signed by unknown authority

bigkevmcd · 2023-10-05T16:27:43Z

@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.

But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.

schdief · 2023-10-05T16:31:56Z

@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.

But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.

I agree that for many pods this is a bad idea, maybe you can add a button to see all pods of an deployment and thr default view only shows the number and maybe failed ones. But if I really want I would still like to see all, even if there are 100 :)

Thanks for looking into it!

foot · 2023-10-10T07:49:21Z

Hi @schdief

The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.

Ah, you do not see the pods in the graph view?

Is this from a kustomization or a helmrelease?

schdief · 2023-10-10T14:57:11Z

Is this from a kustomization or a helmrelease?

Kustomization (using Weave GitOps 0.33)

this is the yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: 2023-10-06T11:11:28Z
  generation: 2
  labels:
    app.kubernetes.io/name: nodebrady
    helm.sh/chart: nodebrady-v0.3.0
    kustomize.toolkit.fluxcd.io/name: nodebrady-master
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:app.kubernetes.io/name: {}
            f:helm.sh/chart: {}
            f:kustomize.toolkit.fluxcd.io/name: {}
            f:kustomize.toolkit.fluxcd.io/namespace: {}
        f:spec:
          f:replicas: {}
          f:selector: {}
          f:strategy: {}
          f:template:
            f:metadata:
              f:creationTimestamp: {}
              f:labels:
                f:app.kubernetes.io/name: {}
            f:spec:
              f:containers:
                k:{"name":"nodebrady"}:
                  .: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:name: {}
                  f:ports:
                    k:{"containerPort":3000,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:protocol: {}
                  f:resources: {}
              f:imagePullSecrets:
                k:{"name":"css-qhcr-sdm-dockerconfig"}: {}
                k:{"name":"css-thcr-sdm-dockerconfig"}: {}
      manager: kustomize-controller
      operation: Apply
      time: 2023-10-10T14:56:31Z
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:deployment.kubernetes.io/revision: {}
        f:status:
          f:availableReplicas: {}
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Progressing"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:observedGeneration: {}
          f:readyReplicas: {}
          f:replicas: {}
          f:updatedReplicas: {}
      manager: kube-controller-manager
      operation: Update
      subresource: status
      time: 2023-10-06T12:28:05Z
  name: nodebrady
  namespace: phippyandfriends-master
  resourceVersion: "217768850"
  uid: 66a1966e-3c03-41f8-84b4-2bfc2a8549cd
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: nodebrady
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: nodebrady
    spec:
      containers:
        - image: xxx/nodebrady:20231006.1426.8-master
          imagePullPolicy: Always
          name: nodebrady
          ports:
            - containerPort: 3000
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
        - name: css-qhcr-sdm-dockerconfig
        - name: css-thcr-sdm-dockerconfig
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
    - lastTransitionTime: 2023-10-06T11:11:32Z
      lastUpdateTime: 2023-10-06T11:11:32Z
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    - lastTransitionTime: 2023-10-06T11:11:28Z
      lastUpdateTime: 2023-10-06T12:28:05Z
      message: ReplicaSet "nodebrady-6cc7f5bbbc" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

foot · 2023-10-17T11:39:37Z

Gotcha! So there is a bug here where we don't shoq the pods in the graph if the namespace differs from the kustomization.

To the other point of showing the pods in the table, we have all the data available, just have to figure out a design..

lasomethingsomething mentioned this issue Oct 11, 2023

Remove the Graph tab from the Applications view weaveworks/weave-gitops-enterprise#3342

Open

foot added the team/pesto label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

show pods in dashboard for easier debugging of reconcilation errors #4068

show pods in dashboard for easier debugging of reconcilation errors #4068

schdief commented Oct 5, 2023

bigkevmcd commented Oct 5, 2023

schdief commented Oct 5, 2023

foot commented Oct 10, 2023

schdief commented Oct 10, 2023 •

edited

Loading

foot commented Oct 17, 2023

show pods in dashboard for easier debugging of reconcilation errors #4068

show pods in dashboard for easier debugging of reconcilation errors #4068

Comments

schdief commented Oct 5, 2023

bigkevmcd commented Oct 5, 2023

schdief commented Oct 5, 2023

foot commented Oct 10, 2023

schdief commented Oct 10, 2023 • edited Loading

foot commented Oct 17, 2023

schdief commented Oct 10, 2023 •

edited

Loading