Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheus] Metrics backwards compatibility after helm chart upgrade to v15 #3708

Open
ioannatheo opened this issue Aug 22, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@ioannatheo
Copy link

Describe the bug a clear and concise description of what the bug is.

We are currently using prometheus helm chart as a helm dependency in our stack and deploy using ArgoCD.

After upgrading prometheus helm chart from v14.12.0 to v15.18.0 we noticed the following changes :
df8add6

If we upgrade to the new helm chart version (v15.18.0) we lose backwards compatibility in terms of monitoring with the already scraped data since the relabel_configs change.
Even if we modify our current queries to adjust our monitoring to the new labels it becomes quite complex to show in the same graph/dashboard data from before and after the upgrade over a range of time.

Another way I see around this is to define the whole prometheus.serverFiles.prometheus.yml.scrape_configs inside our values.yaml file and revert the changes mentioned above in there to maintain the same labels in the scraped data after the upgrade. However this brings us in a position to have to maintain this part of the code.

Is there a simpler way that allows us to upgrade and maintain data usability after ?

What's your helm version?

"HelmVersion": "v3.11.2+g912ebc1"

What's your kubectl version?

"KubectlVersion": "v0.24.2"

Which chart?

prometheus

What's the chart version?

15.18.0

What happened?

Here is an example of how node_cpu_seconds_total metric is changing before and after the upgrade :

What is returned before the upgrade:

node_cpu_seconds_total{app="prometheus", app_kubernetes_io_instance="sb-dev-admin.prometheus", chart="prometheus-14.12.0",
...
job="kubernetes-service-endpoints", kubernetes_name="prometheus-node-exporter", kubernetes_namespace="sb-admin", kubernetes_node="*****", mode="iowait", release="prometheus"}

What is returned after the upgrade:

node_cpu_seconds_total{app="prometheus", app_kubernetes_io_instance="sb-dev-admin.prometheus", chart="prometheus-15.18.0",
...
job="kubernetes-service-endpoints", service="prometheus-node-exporter", namespace="sb-admin", 
node="****", mode="iowait", release="prometheus"}

When trying to plot the CPU per node over time using this metric we cannot do so anymore with one query. We have to use one to fetch the data before the chart upgrade (targeting kubernetes_node label to filter by node) and another to fetch the data after the upgrade (targeting the changed node label). The same is the case for other metrics and other changed labels.

What you expected to happen?

We expected a way to control whether we want to include these relabelling changes on our Prometheus server configmap without having to hardcode the whole prometheus.serverFiles.prometheus.yml.scrape_configs configuration in our codebase just to revert these changes.

How to reproduce it?

Upgrading from v14.12.0 to v15.18.0 and querying an affected metric e.g. node_cpu_seconds_total should be sufficient to notice the issue.

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

Not needed, these are expected changes, the question is mostly on having a way to handle them after the upgrade.

Anything else we need to know?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant