Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/clusterPodStatuses: only process when conditional if specified #1088

Merged
merged 6 commits into from
Apr 6, 2023

Conversation

diamonwiggins
Copy link
Member

Description, Motivation and Context

Currently the when conditional for the clusterPodStatuses analyzer is expected for all outcome types and if not specified the following behavior is seen using the following spec:

apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
  name: kurl-builtin-oncluster
spec:
  collectors:
    - clusterResources: {}
    - clusterInfo: {}
  analyzers:
    - clusterPodStatuses:
        checkName: "Pod(s) healthy"
        outcomes:
          - warn:
              when: "!= Healthy" # Catch all unhealthy pods. A pod is considered healthy if it has a status of Completed, or Running and all of its containers are ready.
              message: "A Pod, {{ .Name }}, is unhealthy with a status of: {{ .Status.Reason }}. Restarting the pod may fix the issue."
          - pass:
              message: "All Pods are OK."

Screenshot 2023-03-31 at 10 36 27 AM

Because when isn't set for pass you see the above errors. With the changes in this PR, when is optional and the expected pattern of having a default outcome at the end of your outcome list is supported.

Screenshot 2023-03-31 at 10 36 02 AM

Fixes: #1087

Checklist

  • New and existing tests pass locally with introduced changes.
  • Tests for the changes have been added (for bug fixes / features)
  • The commit message(s) are informative and highlight any breaking changes
  • Any documentation required has been added/updated. For changes to https://troubleshoot.sh/ create a PR here

Does this PR introduce a breaking change?

  • Yes
  • No

@diamonwiggins diamonwiggins added type::bug Something isn't working bug::normal labels Mar 31, 2023
@diamonwiggins diamonwiggins requested a review from a team as a code owner March 31, 2023 14:55
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diamonwiggins that is great. Could we try to return only one result when pods are Healthy instead of a list with all Healthy pods on the cluster? Could we just return the list of those that are unhealthy?

@diamonwiggins
Copy link
Member Author

@diamonwiggins that is great. Could we try to return only one result when pods are Healthy instead of a list with all Healthy pods on the cluster? Could we just return the list of those that are unhealthy?

I think that's a good idea. Do you think we need to cover that here on this PR?

camilamacedo86
camilamacedo86 previously approved these changes Mar 31, 2023
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return many times the same success result seems a bug out of scope of this PR. See: #1089

I think would be nice it has tests but I am so happy with the fix already.
Thank you 👍

camilamacedo86
camilamacedo86 previously approved these changes Apr 1, 2023
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terrific work 🥇 I added only one nit.
IF possible would be great have the mocks under a directory called testdata to follow the best practices with Golang to keep those. Otherwise, it has my LGTM

@@ -0,0 +1,263 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add it inside of a directory called testdata?
Why Golang has a particular meaning for those and for we add mock data that is the best approach.

Copy link
Member

@banjoh banjoh Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Specifically https://github.com/replicatedhq/troubleshoot/tree/main/testdata

[evans] $ go help test | grep -A1 testdata
The go tool will ignore a directory named "testdata", making it available
to hold ancillary data needed by the tests.

But it can be done in a separate PR. There are many other test data files in the current directory used

@banjoh
Copy link
Member

banjoh commented Apr 3, 2023

@diamonwiggins that is great. Could we try to return only one result when pods are Healthy instead of a list with all Healthy pods on the cluster? Could we just return the list of those that are unhealthy?

Looking at other status analysers (Deployment, StatefulSet...), it looks like the intention is to show the analysis of each resource. I however see your concern here @camilamacedo86. We can improve how these analysis results are reported back by adding a title to each outcome like so. There is a bug (#1093) that needs to be fixed first before this is possible

  analyzers:
    - clusterPodStatuses:
        checkName: "Pod(s) health status(es)"
        outcomes:
          - fail:
              title: "Pod {{ .Name }} is unable to pull images"
              when: "== ImagePullBackOff"
              message: "A Pod, {{ .Name }}, is unable to pull its image. Status is: {{ .Status.Reason }}"
          - warn:
              title: "Pod {{ .Name }} is unhealthy"
              when: "!= Healthy"
              message: "A Pod, {{ .Name }}, is unhealthy with a status of: {{ .Status.Reason }}. Restarting the pod may fix the issue."
          - pass:
              title: "Pod {{ .Name }} is healthy"
              message: "Pod {{ .Name }} is healthy"

This would render results like below

Screenshot 2023-04-03 at 11 53 46

@banjoh
Copy link
Member

banjoh commented Apr 3, 2023

Could we just return the list of those that are unhealthy?

@camilamacedo86 I think if we ignore the pass condition, all the healthy ones are not shown in the analysis outcomes

banjoh
banjoh previously approved these changes Apr 3, 2023
@@ -0,0 +1,266 @@
package analyzer

import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff with the tests!

Nit pick: Could you add/modify tests to use !==, ==, === and = conditionals for completeness? e.g === CrashLoopBackOff

if when != "" {
parts := strings.Split(strings.TrimSpace(when), " ")
if len(parts) < 2 {
println(fmt.Sprintf("invalid 'when' format: %s\n", when)) // don't stop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst here, shall we use klog.Errorf instead? There is another println somewhere in the file

@banjoh banjoh merged commit 9a457f7 into main Apr 6, 2023
@banjoh banjoh deleted the diamonwiggins/fix-cluster-pod-analyzer branch April 6, 2023 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug::normal type::bug Something isn't working
Projects
None yet
3 participants