[prometheus] Metrics backwards compatibility after helm chart upgrade to v15 #3708

ioannatheo · 2023-08-22T13:08:34Z

Describe the bug a clear and concise description of what the bug is.

We are currently using prometheus helm chart as a helm dependency in our stack and deploy using ArgoCD.

After upgrading prometheus helm chart from v14.12.0 to v15.18.0 we noticed the following changes :
df8add6

If we upgrade to the new helm chart version (v15.18.0) we lose backwards compatibility in terms of monitoring with the already scraped data since the relabel_configs change.
Even if we modify our current queries to adjust our monitoring to the new labels it becomes quite complex to show in the same graph/dashboard data from before and after the upgrade over a range of time.

Another way I see around this is to define the whole prometheus.serverFiles.prometheus.yml.scrape_configs inside our values.yaml file and revert the changes mentioned above in there to maintain the same labels in the scraped data after the upgrade. However this brings us in a position to have to maintain this part of the code.

Is there a simpler way that allows us to upgrade and maintain data usability after ?

What's your helm version?

"HelmVersion": "v3.11.2+g912ebc1"

What's your kubectl version?

"KubectlVersion": "v0.24.2"

Which chart?

prometheus

What's the chart version?

15.18.0

What happened?

Here is an example of how node_cpu_seconds_total metric is changing before and after the upgrade :

What is returned before the upgrade:

node_cpu_seconds_total{app="prometheus", app_kubernetes_io_instance="sb-dev-admin.prometheus", chart="prometheus-14.12.0",
...
job="kubernetes-service-endpoints", kubernetes_name="prometheus-node-exporter", kubernetes_namespace="sb-admin", kubernetes_node="*****", mode="iowait", release="prometheus"}

What is returned after the upgrade:

node_cpu_seconds_total{app="prometheus", app_kubernetes_io_instance="sb-dev-admin.prometheus", chart="prometheus-15.18.0",
...
job="kubernetes-service-endpoints", service="prometheus-node-exporter", namespace="sb-admin", 
node="****", mode="iowait", release="prometheus"}

When trying to plot the CPU per node over time using this metric we cannot do so anymore with one query. We have to use one to fetch the data before the chart upgrade (targeting kubernetes_node label to filter by node) and another to fetch the data after the upgrade (targeting the changed node label). The same is the case for other metrics and other changed labels.

What you expected to happen?

We expected a way to control whether we want to include these relabelling changes on our Prometheus server configmap without having to hardcode the whole prometheus.serverFiles.prometheus.yml.scrape_configs configuration in our codebase just to revert these changes.

How to reproduce it?

Upgrading from v14.12.0 to v15.18.0 and querying an affected metric e.g. node_cpu_seconds_total should be sufficient to notice the issue.

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

Not needed, these are expected changes, the question is mostly on having a way to handle them after the upgrade.

Anything else we need to know?

No response

The text was updated successfully, but these errors were encountered:

ioannatheo added the bug Something isn't working label Aug 22, 2023

ghost mentioned this issue Feb 14, 2024

Metrics missing, therefore grafana have empty graphics after bigbang upgrade DoD-Platform-One/bigbang#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prometheus] Metrics backwards compatibility after helm chart upgrade to v15 #3708

[prometheus] Metrics backwards compatibility after helm chart upgrade to v15 #3708

ioannatheo commented Aug 22, 2023

[prometheus] Metrics backwards compatibility after helm chart upgrade to v15 #3708

[prometheus] Metrics backwards compatibility after helm chart upgrade to v15 #3708

Comments

ioannatheo commented Aug 22, 2023

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?