Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minikube upper than 1.24 fail to generate metrics container_cpu_system_seconds_total as a counter #13656

Closed
gaellm opened this issue Feb 22, 2022 · 7 comments · Fixed by #13802
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@gaellm
Copy link

gaellm commented Feb 22, 2022

What Happened?

When starting minikube 1.25.0 or 1.25.1 (with docker driver, or virtualbox), my CPU metrics graphs fail. I've observed that some prometheus counters value stay constant:
Capture d’écran 2022-02-22 à 00 17 42

When I downgrade to minikube 1.24.0, all works fine:
Screenshot 2022-02-22 at 09 30 08

My Prometheus scrap url is: /api/v1/nodes/minikube/proxy/metrics/cadvisor

To reproduce:

brew install --cask virtualbox
brew install minikube
minikube start
git clone https://github.com/gaellm/minikube-1-25-issue-demo.git
cd minikube-1-25-issue-demo
./deploy.sh

Then with a port-forward 9090 on prometheus pod you can observe the container_cpu_system_seconds_total metric.

(tested with 2 computers, an arm and an amd mac, with same conclusions)

Attach the log file

log.txt

Operating System

macOS (Default)

Driver

VirtualBox

@spowelljr
Copy link
Member

Hi @gaellm, thanks for reporting your issue with minikube!

If you do minikube start --extra-config=kubelet.housekeeping-interval=10s it should report metrics correctly, the usage reporting interval was increased to alleviate some CPU usage but you can set it back to the default.

@spowelljr spowelljr added the kind/support Categorizes issue or PR as a support question. label Feb 22, 2022
@gaellm
Copy link
Author

gaellm commented Feb 23, 2022

Hi, thanks @spowelljr, indeed, this parameter solves the problem.

@gaellm gaellm closed this as completed Feb 23, 2022
@asaintsever
Copy link

asaintsever commented Mar 4, 2022

Hi @spowelljr

I noticed same kind of issue today as I was playing with the metrics server on minikube 1.25.2. I was puzzled first as I was sure I got metrics some month ago on previous minikube releases. So I searched here for known related issues and, luckily, stumbled on this one.
It happens that providing your extra param fixes the issue. I can see that the new default at cluster creation is kubelet.housekeeping-interval=5m. Problem is that metrics server/metrics are totally broken now and we have no way to find the root cause as:

  • kubectl top node is working
  • kubectl top pod <pod> reports "Error from server (NotFound): podmetrics.metrics.k8s.io "<pod name>" not found". Same with kubectl top pod -A.
  • logs from metrics server pod do not display any errors

Waiting past the housekeeping interval (more than 5m) does not change anything to this behavior.

So to me this should be considered a bug. I saw there's a new --disable-optimizations flag also but my point is that metrics are not working properly by default now and this is not an expected optimization. WDYT?

(I am using podman driver on minikube 1.25.2 with metrics server installed from official chart v3.8.2 by the way. Got same issues with the metrics server addon)

@medyagh medyagh reopened this Mar 4, 2022
@medyagh
Copy link
Member

medyagh commented Mar 4, 2022

thank you @asaintsever for taking the time, this is an interesting discussion, the quesiton, would most users care about having a lower cpu usage or would they care most about having the metrics at more intervals?

the graphs shared in this issue shows that the metrics are not broken but they are exported at less interval.

I am open in this discussion and I have set mind, would u mind sharing examples of why metric server helps your case?

or is there a way that minikube detect that u are using metric server and Suggest you to provide the option automatically?

@medyagh
Copy link
Member

medyagh commented Mar 4, 2022

alternatviely we could provide a new flag for these kind of optimizations so users do it optionally.

@medyagh medyagh added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 4, 2022
@medyagh medyagh added this to the 1.26.0 milestone Mar 4, 2022
@asaintsever
Copy link

asaintsever commented Mar 5, 2022

Thanks @medyagh

My experience is a bit different compared to the original issue and the graphs. As I said, I still get errors and no results from kubectl top pod even after waiting past the housekeeping interval. I did other tests with workloads running for more than a day and here again: no metrics but the "Error from server (NotFound): podmetrics.metrics.k8s.io "<pod name>" not found" message. I would expect to have some figures for those workloads, even if refresh rate has been increased.

To give you a simple use case: try to use Kubernetes HPA following your change. HPA requires the Metrics server to automatically scale in/out a workload. As a test, just follow the example provided here https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
You'll quickly see that the HPA mechanism does not work (no scale out) because of the metrics issue.

@spowelljr spowelljr self-assigned this Mar 16, 2022
@medyagh
Copy link
Member

medyagh commented Mar 16, 2022

given this is affecting some of the standard uses of Kubernetes, I purpose we revert this optimization and make it optiional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants