-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
show pods in dashboard for easier debugging of reconcilation errors #4068
Comments
@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing. But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do. |
I agree that for many pods this is a bad idea, maybe you can add a button to see all pods of an deployment and thr default view only shows the number and maybe failed ones. But if I really want I would still like to see all, even if there are 100 :) Thanks for looking into it! |
Hi @schdief
Ah, you do not see the pods in the graph view? Is this from a kustomization or a helmrelease? |
Gotcha! So there is a bug here where we don't shoq the pods in the graph if the namespace differs from the kustomization. To the other point of showing the pods in the table, we have all the data available, just have to figure out a design.. |
Problem
When a reconcilation fails, e. g. due to an imagepullbackoff due to a wrong image tag or missing pull secret, the dashboard is quite useless, as it only shows the deployment and not the pods (old and new).
Solution
The dashboard should also show all the pods and not just the deployment. For the pods it should show the whole yaml to see all the events.
Additional context
For a test deployment I have deliberately broken the image tag reference to get a ImagePullBackOff. So reconciliation fails and the old pod stays active.
Unfortunately Weave GitOps doesn’t tell me that story. It only tells me that reconciliation is in progress and something fails the health check, but in order to see the problem I need to connect to the cluster and use kubectl:
NAME READY STATUS RESTARTS AGE
pod/release-name-nodebrady-5978488bb8-m62gd 1/1 Running 0 11m
pod/release-name-nodebrady-c9897f486-6rmgn 0/1 ImagePullBackOff 0 5m4s
The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.
Clicking on the failing pod I would then also see the reason for the imagepullbackoff:
Warning Failed 17m (x4 over 18m) kubelet Failed to pull image "peter:pan": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/peter:pan": failed to resolve reference "docker.io/library/peter:pan": failed to do request: Head https://registry-1.docker.io/v2/library/peter/manifests/pan: x509: certificate signed by unknown authority
The text was updated successfully, but these errors were encountered: