Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubevirt_handler integration #18283

Merged
merged 31 commits into from
Sep 27, 2024

Conversation

NouemanKHAL
Copy link
Member

@NouemanKHAL NouemanKHAL commented Aug 9, 2024

What does this PR do?

Motivation

https://datadoghq.atlassian.net/browse/PLINT-458

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Changelog entries must be created for modifications to shipped code
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@datadog-agent-integrations-bot datadog-agent-integrations-bot bot added documentation release qa/skip-qa Automatically skip this PR for the next QA labels Aug 9, 2024
Copy link

github-actions bot commented Aug 9, 2024

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 91.32653% with 17 lines in your changes missing coverage. Please review.

Project coverage is 89.97%. Comparing base (0eac087) to head (79da6d7).
Report is 3 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
appgate_sdp ?
cassandra ?
hive ?
ignite ?
jboss_wildfly ?
kafka ?
kubevirt_handler 91.32% <91.32%> (?)
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@NouemanKHAL NouemanKHAL force-pushed the noueman/add-kubevirt-handler-integration branch from d271f46 to 07ebb27 Compare August 12, 2024 18:44
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

cswatt
cswatt previously approved these changes Aug 12, 2024
Copy link
Contributor

@cswatt cswatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for docs!

Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Comment on lines +69 to +72
self.base_tags = [
"pod_name:{}".format(self.pod_name),
"kube_namespace:{}".format(self.kube_namespace),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you specifically need these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveny91 I'm not sure if the agent will attach these tags automatically since it's running on K8s, but the agent status output only shows these host tags:

host tags:
    cluster_name:XXXXX
    kube_cluster_name:XXXXX
    kube_node:XXXXX

And having these tags provides better tracking for the kubevirt resources, especially the pod name since you could have multiple pods behind the same kubevirt service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These wouldn't show in the host level tags. But they will show in the metrics/instance level tags. I.e:

agent configcheck

...
=== istio check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/istio.d/auto_conf.yaml
Config for instance ID: istio:60f581df689cb765
istio_mesh_endpoint: http://10.108.0.9:15020/stats/prometheus
send_histograms_buckets: true
tag_by_endpoint: false
tags:
- image_id:********@sha256:57621adeb78e67c52e34ec1676d1ae898b252134838d60298c7446d0964551cc
- image_name:docker.io/istio/proxyv2
- image_tag:1.22.1
- kube_container_name:istio-proxy
- kube_deployment:reviews-v3
- kube_namespace:default
- kube_ownerref_kind:replicaset
- kube_qos:Burstable
- kube_replica_set:reviews-v3-7d99fd7978
- kube_service:reviews
- pod_phase:running
- short_image:proxyv2
...

These just represents the tags that automatically get added by the agent. But these are the ones where the agent scrapes the metrics from.

Pod name would indeed be beneficial. The agent is able to also attach it but will need to set a param in the agent config for cardinality:
https://docs.datadoghq.com/containers/kubernetes/tag/?tab=datadogoperator#:~:text=pod_name,Pod%20metadata

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, very interesting! Do we have a common approach to kube based integrations on this matter?
Any downsides on adding the pod_name tag on the integration level?

Copy link
Contributor

@steveny91 steveny91 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any downsides on adding the pod_name tag on the integration level?

I don't really believe so. If you need the tag for the integration to make more sense then it's understandable and will be a better usage than having it enabled agent wide for all integrations to pull in this tag. Since it's either all or nothing right now. The only concern is if the pod are short lived and will create new context every time it recreates.

Do we have a common approach to kube based integrations on this matter?

Right now, no. But I've encountered some concerns about these kube_* and pod_name tags collected by the agent natively. Essentially these are tagged from where these metrics are exposed (e.g. an exporter) but sometimes the metric themselves in prometheus has a label called namespace or pod name indicating where this specific metric is collected from by the exporter. Stuff like this can be remapped to their respective pod_name, kube_namespace for better alignment of the kubernetes integration/metrics with that of the ones reported by the exporter. But I'm not sure if this suits your case here.

For example:

kubevirt_vmi_cpu_system_usage_seconds_total{kubernetes_vmi_label_kubevirt_io_domain="testvm",kubernetes_vmi_label_kubevirt_io_nodeName="ip-192-168-58-28.eu-west-3.compute.internal",kubernetes_vmi_label_kubevirt_io_size="small",name="aws-kubevirt-vm-1",namespace="default",node="ip-192-168-58-28.eu-west-3.compute.internal"} 110.09

The origin of this metric I think is from a pod in the default namespace. If the agent is scraping this metric from the service that is exposing this is in namespace:foo for example, then all these metrics will come in with kube_namespace:foo, namespace:default. If you want to correlate this with like kubernetes.* metrics, then you can't align them using the kube_namespace tag.

Generic labels such as name should also be handled using a remapper in case it clashes with host tags provided by gcp for example.

https://github.com/DataDog/integrations-core/tree/master/argocd#:~:text=The%20Argo%20CD%20integration,have%20a%20name%20tag

@@ -0,0 +1,47 @@
metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags
kubevirt_handler.can_connect,gauge,,,,Whether the check can connect to the KubeVirt Handler or not.,0,kubevirt_handler,handler connect,,
Copy link
Contributor

@steveny91 steveny91 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add 1 if able 0 if not or something along those lines so user can know what the value represents

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that in a follow up PR where I will align all the kubevirt integrations.

@@ -0,0 +1,47 @@
metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags
kubevirt_handler.can_connect,gauge,,,,Whether the check can connect to the KubeVirt Handler or not.,0,kubevirt_handler,handler connect,,
kubevirt_handler.info,gauge,,,,"KubeVirt Version information",0,kubevirt_handler,short_name,,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with @steveny91

Copy link
Member Author

@NouemanKHAL NouemanKHAL Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveny91 For some reason, It's not working for me, do we have any documentation about it?
The kubevirt info metric looks like:

# HELP kubevirt_info Version information
# TYPE kubevirt_info gauge
kubevirt_info{goversion="go1.21.8 X:nocoverageredesign",kubeversion="v1.2.2"} 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixed, I had to use a metrics transformer.

)


def test_emits_can_connect_zero_when_service_is_down(dd_run_check, aggregator, instance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test here that no other metrics are emitted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could test that in a separate test

@NouemanKHAL NouemanKHAL merged commit e6db76d into master Sep 27, 2024
47 checks passed
@NouemanKHAL NouemanKHAL deleted the noueman/add-kubevirt-handler-integration branch September 27, 2024 15:46
github-actions bot pushed a commit that referenced this pull request Sep 27, 2024
* initial scaffolding

* report health check metric

* report kubevirt_handler metrics

* update spec.yaml and config/models

* ddev validate

* rename Kubevirt to KubeVirt

* Add E2E tests

* fix CI and e2e tests

* fix manifest.json

* delete dashboard file

* ddev validate ci

* apply suggestions from other PRs

* remove time.sleep from dd_environment env setup

* ddev validate ci

* add DEFAULT_METRIC_LIMIT = 0

* ddev validate labeler --sync

* ddev validate ci

* add basic units to metadata.csv

* add more units in metadata.csv

* send the kubevirt version as metadata

* update README integration link

* fix manifest.json using deleted .info metric

* update README

* fix manifest metrics entries

* update manifest.json e6db76d
@NouemanKHAL NouemanKHAL mentioned this pull request Oct 1, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants