-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dvue/k8s otel alert policies #2558
Dvue/k8s otel alert policies #2558
Conversation
Marking as Ready, but pending additional changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douavue I have added some comments and Could you please ensure consistency in the title "Kubernetes (OpenTelemetry)" throughout the quickstart.
Kubernetes is an open-source container-orchestration system for automating | ||
computer application deployment, scaling, and management. | ||
url: >- | ||
https://docs.newrelic.com/docs/integrations/host-integrations/host-integrations-list/kubernetes-monitoring-integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.newrelic.com/docs/integrations/host-integrations/host-integrations-list/kubernetes-monitoring-integration | |
https://docs.newrelic.com/docs/kubernetes-pixie/kubernetes-integration/installation/k8s-otel/#install |
I noticed that we have a Kubernetes OpenTelemetry-specific document.
url: >- | ||
https://docs.newrelic.com/docs/integrations/host-integrations/host-integrations-list/kubernetes-monitoring-integration | ||
dataSourceIds: | ||
- kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- kubernetes | |
- kubernetes-opentelemetry |
Since this quickstart is different from Kubernetes, you need to create a new datasource: kubernetes-opentelemetry
. Here is the example: https://github.com/newrelic/newrelic-quickstarts/tree/release/data-sources/apigee-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sjyothi54. Do the dataSourceId
and dashboards
properties require a value or can I remove them from the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @douavue, dashboards are optional, but it is best practice to provide them in order to show the customer what are the metrics they get in dashboard. Since this is a new quickstart, datasourceIds is required. Please provide datasourceIds for the Kubernets opentelemetry, it is used to create install steps in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjyothi54 - I pushed an update here. We are not including a dashboard for now. Can you please review again? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjyothi54 we don't provide a dashboard, because the K8s OTel instrumentation enables all the regular K8s UIs so no additional dashboards are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjyothi54 we don't provide a dashboard, because the K8s OTel instrumentation enables all the regular K8s UIs so no additional dashboards are needed.
@mangulonr @douavue Thanks for the clarification
dashboards: | ||
- kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dashboards: | |
- kubernetes | |
dashboards: | |
- kubernetes |
Since you attached the existing Kubernetes dashboard, will this dashboard include the metrics that we receive in Kubernetes OpenTelemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douavue Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @douavue, please add the icon
attribute in the file quickstarts/kubernetes-opentelemetry/config.yml
and make sure the logo image is related to Kubernetes.
primary: | ||
link: | ||
url: https://docs.newrelic.com/docs/kubernetes-pixie/kubernetes-integration/installation/k8s-otel/#install | ||
icon: logo.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
icon: logo.png | |
icon: logo.svg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RamanaReddy8801 - Updated here
@sjyothi54 - Updated the logo here. What's required to for this step? Thanks for all your help. |
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description says Alert when container is being throttled > 25%
but the threshold value is 90. I think it should be 25?
|
||
# NRQL query | ||
nrql: | ||
query: "from Metric max(container_memory_working_set_bytes) / filter(max(kube_pod_container_resource_limits), where resource = 'memory') where k8s.cluster.name in ('YOUR_CLUSTER_NAME') and k8s.namespace.name in ('YOUR_NAMESPACE_NAME') facet k8s.container.name, k8s.pod.name, k8s.namespace.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing select
. It should be from Metric select max(container_memory_working_set_bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't get this one to work because I can't find Metrics where they contain container_memory_working_set_bytes
and k8s.container.name
at the same time. It could be a skill issue tho.
|
||
# NRQL query | ||
nrql: | ||
query: "from Metric select latest(k8s.node.cpu.utilization) where k8s.cluster.name in ('YOUR_CLUSTER_NAME') facet k8s.node.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe latest(k8s.node.cpu.utilization)
would get you the last CPU utilization for the 5 minute window, which could be a brief spike where the other 4 minutes were relatively low. You might want average(k8s.node.cpu.utilization)
based on the description.
|
||
# NRQL query | ||
nrql: | ||
query: "from Metric select latest(k8s.node.memory.working_set) / filter(latest(kube_node_status_allocatable), WHERE resource = 'memory') where k8s.cluster.name in ('YOUR_CLUSTER_NAME') facet k8s.node.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as CPU. Do you want average(k8s.node.memory.working_set)
?
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query returns a value 0 to 1. We would need to multiply the select
value by 100 or divide this by 100.
|
||
# NRQL query | ||
nrql: | ||
query: "from Metric select max(k8s.node.filesystem.usage) / max(k8s.node.filesystem.capacity) where k8s.cluster.name in ('YOUR_CLUSTER_NAME') facet k8s.node.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description says average
but the query uses max
.
|
||
# NRQL query | ||
nrql: | ||
query: "from Metric if(latest(condition) = 'Ready', 0, 1) where k8s.cluster.name in ('YOUR_CLUSTER_NAME') facet k8s.node.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a select
in from Metric select if(latest(condition)
|
||
# NRQL query | ||
nrql: | ||
query: "FROM Metric select filter(uniqueCount(k8s.pod.name), where phase = 'Running' AND (metricName = 'kube_pod_status_phase' AND kube_pod_status_phase ['latest'] = 1) and created_by_kind != 'Job' ) / filter(latest(kube_node_status_allocatable), WHERE resource = 'cpu' ) * 100 as 'Pod Capacity %' where k8s.node.name != '' and k8s.node.name is not null and k8s.cluster.name in ('YOUR_CLUSTER_NAME') facet k8s.node.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe WHERE resource = 'cpu'
should be WHERE resource = 'pods'
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description says Alert when more than 5 pods
so I think this should be 5.
|
||
# NRQL query | ||
nrql: | ||
query: "from Metric latest(kube_statefulset_replicas) - latest(kube_statefulset_status_replicas_ready) where k8s.cluster.name in ('YOUR_CLUSTER_NAME') and k8s.namespace.name in ('YOUR_NAMESPACE_NAME') facet k8s.statefulset.name, k8s.namespace.name, k8s.cluster.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing select
in from Metric latest(kube_statefulset_replicas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
A concise description of the changes being introduced. Please review the pre-merge checklist section to validate this pull request is ready for review and merge. If it is not ready, please mark the pull request as a draft.
The owners of this repo are not experts in the subject matter of the quickstarts. We review for the quickstart to be functional and for security risks. If you are seeking feedback on the content of the quickstart, please seek out a subject matter expert. If you are not an internal NR contributor, we can do our best to connect you with a content reviewer.
Pre merge checklist
Dashboards
Alerts