Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] K8s otel overview dashboard #10910

Conversation

tetianakravchenko
Copy link
Contributor

@tetianakravchenko tetianakravchenko commented Aug 28, 2024

Proposed commit message

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

Current state:
Screenshot 2024-08-29 at 18 59 45

TODO:

  • add total numbers: nodes, namespaces, workloads (? - for later maybe)
    • workload is missing for now
    • add container visualisation :
      • number of restarts
      • unhealthy containers (? maybe)
  • cluster-wide - keep colors, but use defaults
  • should be tested with multi nodes
  • add the same panels we have with ad-hoc thing for deploy/ds/sts/pods
  • add workload + nodes events warning + errors
  • add grouping
  • add filtering panel
  • To add a tag : OpenTelemetry sth
  • Document somewhere the final dashboard
  • Use the api command to import it and test it

decisions:

  • not change manifests: pods allocatable and node conditions
  • cronjob/jobs are not included in the workload (for now)

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@andrewkroh andrewkroh added the dashboard Relates to a Kibana dashboard bug, enhancement, or modification. label Aug 30, 2024
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@elasticmachine
Copy link

💔 Build Failed

Failed CI Steps

History

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments for alignment with the spec.

  1. Could we try to be consistent with the terminology, i.e. usage vs utilization vs pct? Our panel titles etc should be aligned with the semantic convention definitions. Hence we should only use usage or utilization where it applies (no need for pct, utilization is a ratio anyways).
  2. Could we consider leveraging the limits' utilization metrics as well? There was strong push-back by the community when we introduced the *.node.utilization metrics so relying only these might be risky in case the community decides to deprecate them. From my perspective these are nice to haves but we can rely to *.cpu.usage ones and the limit based utilizations.
  3. Also in the panels it would be nice if we explicitly mention what utilization we display (ratio against the node's capacity VS ratio against the limits).

"query": "\"metrics.k8s.node.cpu.usage\": *"
},
"isBucketed": false,
"label": "CPU usage Pct",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s.node.cpu.usage metric represents the number of cores used in a time window, hence the term Pct is not accurate here. Or there is a specific reason for mentioning this?

"query": "\"metrics.k8s.pod.cpu.node.utilization\": *"
},
"isBucketed": false,
"label": "Pod CPU Usage ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this CPU utilization instead of usage? Also maybe you can consider explicitly mentioning that this utilization is against the Node's capacity and maybe add another graph for the limit based utilizations?

@tetianakravchenko
Copy link
Contributor Author

All comments were addressed in #11310, closing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:kubernetes Kubernetes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants