Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes pod-level information missing? #225

Closed
autolyticus opened this issue Oct 17, 2022 · 7 comments
Closed

Kubernetes pod-level information missing? #225

autolyticus opened this issue Oct 17, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@autolyticus
Copy link

autolyticus commented Oct 17, 2022

Bug description

I have deployed using the Helm chart for Scaphandre - v0.1.0, and looking at the documentation for the Prometheus exporter, I can see that I am supposed to get additional information and metadata about the pods.

However, I noticed that the scaph_process_power_consumption_microwatts metrics (as exposed from the scaphandre:8080 service, and accessing from /metrics port does not contain this information. Specifically, the container_scheduler is detected and set correctly as kubernetes, but the relevant metrics about kubernetes_pod_name and kubernetes_pod_namespace are missing!

scaph_process_power_consumption_microwatts{container_id="cri-containerd-3c5e5e3ab20f1aaa9cdb2f7b1c9d910aeb776f07a03fd22b3fce470ac7191739",exe="tini",cmdline="/usr/bin/tini-w-e143--/opt/kafka/bin/kafka-server-start.sh/tmp/strimzi.properties",container_scheduler="kubernetes",pid="15523"} 0

As you can see above, all that it contains is the information about the cmdline for the pod as well as the corresponding container id on containerd which is quite obscure and difficult to use.

This makes it very difficult to cross-correlate and understand which pod and which namespace is consuming more energy.

Am I doing something wrong in my deployment, or is there a gap in my expectation? Is this a bug on Scaphandre's side?

To Reproduce

  1. Deploy latest Helm chart for Scaphandre onto the monitoring namespace
  2. kubectl port-forward -n monitoring svc/scaphandre 3000:8080
  3. Open http://localhost:3000/metrics on a web browser
  4. Notice that scaph_process_power_consumption_microwatts metrics are having container_scheduler label set as kubernetes but are missing kubernetes_pod_name labels (like the example given above)

Expected behavior

Expected to see the pod metadata information along with the other information like cmdline, pid etc.

Screenshots

Environment

Linux - Debian Bullseye
Deployed on K3S 1.24.6

  • Linux distribution version [e.g. Ubuntu 20.04.1]
  • Kernel version (output of uname -r) [e.g. 5.4.0-54-generic]
    5.10.0-18-amd64

Additional context

There is absolutely no errors or warning logs on the Scaphandre daemonset pods

@autolyticus autolyticus added the bug Something isn't working label Oct 17, 2022
@mmadoo
Copy link
Contributor

mmadoo commented Oct 26, 2022

I do not have such issue on k8s v1.21.8 using latest helm chart on main branch.

@autolyticus
Copy link
Author

@mmadoo Thanks for getting back regarding this issue. Could you please show me an example of the full metric line for scaph_process_power_consumption_microwatts which should be available on the Scaphandre's /metrics endpoint?

@mmadoo
Copy link
Contributor

mmadoo commented Oct 26, 2022

I have for instance a line with
scaph_process_power_consumption_microwatts{container_scheduler="kubernetes",container_id="2579a8513029f0fb26891985e49cf61802e26833d6b04ebaa2ca6191c6fba18a",kubernetes_node_name="workerdcbraindev04",kubernetes_pod_name="kubecost-grafana-6744d99888-4zhmd",pid="2767836",kubernetes_pod_namespace="kubecost",exe="grafana-server",cmdline="grafana-server--homepath=/usr/share/grafana--config=/etc/grafana/grafana.ini--packaging=dockercfg:default.log.mode=consolecfg:default.paths.data=/var/lib/grafanacfg:default.paths.logs=/var/log/grafanacfg:default.paths.plugins=/var/lib/grafana/pluginscfg:default.paths.provisioning=/etc/grafana/provisioning"} 92542

There are also some metrics without kubernetes_pod_namespace, but this is expected as its are not corresponding to kubenertes pods:
scaph_process_power_consumption_microwatts{container_runtime="containerd",pid="2767337",exe="containerd-shim",cmdline="/usr/bin/containerd-shim-runc-v2-namespacemoby-idddf2449e37f15f388eb68045a0851c0f49d0b554b2683113e951c6f32c9ac4e9-address/run/containerd/containerd.sock"} 0

@autolyticus
Copy link
Author

@mmadoo Thank you! I have gotten an idea on what might be the issue (might be related to how K3S works). I am going to presently try re-deploying K3S using the distro provided containerd as opposed to its default embedded containerd.

I will try this and get back.

@autolyticus
Copy link
Author

@mmadoo It seems to be working when launching K3S with the --docker flag. Thanks a lot for your help!

It looks like Scaphandre (obviously) has some expectations wrt the base distribution of Kubernetes. I'd imagine that there's no way for Scaphandre to detect K3S's embedded containerd socket to retrieve pod metadata.

So this is definitely not a bug on Scaphandre's side, and is a result of my Kubernetes distribution.

@mickours
Copy link

I just face the same issue.

k3s by default do not export kubernetes_pod_name, kubernetes_node_name and kubernetes_pod_namespace.

Also, the Helm chart does not work on the current default k3s version (v1.25) due to PodSecurityPolicy deprecation. See #246

So for others who want to setup Scaphandre on k3s:

  • use a version 1.24 (or less)
  • use the --docker flag (thanks @reisub0!)

Maybe it should be said in the doc somewhere but I'm not sure where...

@rossf7
Copy link
Contributor

rossf7 commented Jan 3, 2023

@mickours @reisub0 If the metrics are present but the pod name and namespace labels are missing this is likely a problem mapping the pid to its container ID.

This is done using the cgroup file for the process e.g. /proc/1234/cgroup. Unfortunately the format varies depending on the container runtime and host OS. This will be why the --docker flag helps.

In my testing I found there were problems with cgroups v2 as the paths in the file are now relative. I noticed this as ubuntu 20.04 works but 22.04 does not.

@bpetit I tried adjusting the regular expression but I couldn't get it to work. Not certain but maybe the procfs crate also needs to be upgraded?

#250 has a fix for the problem with k8s 1.25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Previous releases
Development

No branches or pull requests

4 participants