-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kubevirt_handler
integration
#18283
Add kubevirt_handler
integration
#18283
Conversation
The |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. |
The |
The |
The |
The |
d271f46
to
07ebb27
Compare
The |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for docs!
The |
self.base_tags = [ | ||
"pod_name:{}".format(self.pod_name), | ||
"kube_namespace:{}".format(self.kube_namespace), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you specifically need these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveny91 I'm not sure if the agent will attach these tags automatically since it's running on K8s, but the agent status
output only shows these host tags:
host tags:
cluster_name:XXXXX
kube_cluster_name:XXXXX
kube_node:XXXXX
And having these tags provides better tracking for the kubevirt resources, especially the pod name since you could have multiple pods behind the same kubevirt service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These wouldn't show in the host level tags. But they will show in the metrics/instance level tags. I.e:
agent configcheck
...
=== istio check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/istio.d/auto_conf.yaml
Config for instance ID: istio:60f581df689cb765
istio_mesh_endpoint: http://10.108.0.9:15020/stats/prometheus
send_histograms_buckets: true
tag_by_endpoint: false
tags:
- image_id:********@sha256:57621adeb78e67c52e34ec1676d1ae898b252134838d60298c7446d0964551cc
- image_name:docker.io/istio/proxyv2
- image_tag:1.22.1
- kube_container_name:istio-proxy
- kube_deployment:reviews-v3
- kube_namespace:default
- kube_ownerref_kind:replicaset
- kube_qos:Burstable
- kube_replica_set:reviews-v3-7d99fd7978
- kube_service:reviews
- pod_phase:running
- short_image:proxyv2
...
These just represents the tags that automatically get added by the agent. But these are the ones where the agent scrapes the metrics from.
Pod name would indeed be beneficial. The agent is able to also attach it but will need to set a param in the agent config for cardinality:
https://docs.datadoghq.com/containers/kubernetes/tag/?tab=datadogoperator#:~:text=pod_name,Pod%20metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, very interesting! Do we have a common approach to kube based integrations on this matter?
Any downsides on adding the pod_name tag on the integration level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any downsides on adding the pod_name tag on the integration level?
I don't really believe so. If you need the tag for the integration to make more sense then it's understandable and will be a better usage than having it enabled agent wide for all integrations to pull in this tag. Since it's either all or nothing right now. The only concern is if the pod are short lived and will create new context every time it recreates.
Do we have a common approach to kube based integrations on this matter?
Right now, no. But I've encountered some concerns about these kube_* and pod_name tags collected by the agent natively. Essentially these are tagged from where these metrics are exposed (e.g. an exporter) but sometimes the metric themselves in prometheus has a label called namespace or pod name indicating where this specific metric is collected from by the exporter. Stuff like this can be remapped to their respective pod_name, kube_namespace for better alignment of the kubernetes integration/metrics with that of the ones reported by the exporter. But I'm not sure if this suits your case here.
For example:
kubevirt_vmi_cpu_system_usage_seconds_total{kubernetes_vmi_label_kubevirt_io_domain="testvm",kubernetes_vmi_label_kubevirt_io_nodeName="ip-192-168-58-28.eu-west-3.compute.internal",kubernetes_vmi_label_kubevirt_io_size="small",name="aws-kubevirt-vm-1",namespace="default",node="ip-192-168-58-28.eu-west-3.compute.internal"} 110.09
The origin of this metric I think is from a pod in the default namespace. If the agent is scraping this metric from the service that is exposing this is in namespace:foo for example, then all these metrics will come in with kube_namespace:foo
, namespace:default
. If you want to correlate this with like kubernetes.* metrics, then you can't align them using the kube_namespace tag.
Generic labels such as name should also be handled using a remapper in case it clashes with host tags provided by gcp for example.
@@ -0,0 +1,47 @@ | |||
metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags | |||
kubevirt_handler.can_connect,gauge,,,,Whether the check can connect to the KubeVirt Handler or not.,0,kubevirt_handler,handler connect,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add 1 if able 0 if not or something along those lines so user can know what the value represents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into that in a follow up PR where I will align all the kubevirt integrations.
kubevirt_handler/metadata.csv
Outdated
@@ -0,0 +1,47 @@ | |||
metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags | |||
kubevirt_handler.can_connect,gauge,,,,Whether the check can connect to the KubeVirt Handler or not.,0,kubevirt_handler,handler connect,, | |||
kubevirt_handler.info,gauge,,,,"KubeVirt Version information",0,kubevirt_handler,short_name,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this better be served as a metadata entry instead of a metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree with @steveny91
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveny91 For some reason, It's not working for me, do we have any documentation about it?
The kubevirt info metric looks like:
# HELP kubevirt_info Version information
# TYPE kubevirt_info gauge
kubevirt_info{goversion="go1.21.8 X:nocoverageredesign",kubeversion="v1.2.2"} 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixed, I had to use a metrics transformer.
) | ||
|
||
|
||
def test_emits_can_connect_zero_when_service_is_down(dd_run_check, aggregator, instance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test here that no other metrics are emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could test that in a separate test
* initial scaffolding * report health check metric * report kubevirt_handler metrics * update spec.yaml and config/models * ddev validate * rename Kubevirt to KubeVirt * Add E2E tests * fix CI and e2e tests * fix manifest.json * delete dashboard file * ddev validate ci * apply suggestions from other PRs * remove time.sleep from dd_environment env setup * ddev validate ci * add DEFAULT_METRIC_LIMIT = 0 * ddev validate labeler --sync * ddev validate ci * add basic units to metadata.csv * add more units in metadata.csv * send the kubevirt version as metadata * update README integration link * fix manifest.json using deleted .info metric * update README * fix manifest metrics entries * update manifest.json e6db76d
What does this PR do?
Motivation
https://datadoghq.atlassian.net/browse/PLINT-458
Additional Notes
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged