Skip to content

Missing OpenMetrics/Prometheus metrics after upgrade to 7.68.0 #21054

@Nevon

Description

@Nevon

Version 7.68.0 of the Datadog agent, likely unknowingly, introduces stricter parsing of OpenMetrics and can cause metric collection to fail where in previous versions it would succeed. As far as I can tell, the stricter parsing is correct as per the spec, so I'm not opening this issue in order to somehow re-introduce the more lax parsing, but rather to save other people from spending the amount of time I did in order to figure out why metrics are missing post upgrade.

Version 37.13.0 of datadog_checks_base upgraded the version of the Python prometheus client to version 0.22.0. In that version of the prometheus client, the metric parsing code was largely rewritten in order to support UTF-8. The new implementation seems to follow the specification far more closely than the previous implementation. So far I have found two cases of metrics that were previously accepted and now lead to a parsing error:

  • Duplicate labels. For example purchase_count{product="foo",product="bar"} 1.0. This is obviously incorrect, but in the previous version this was accepted, though I'm not sure if the first or last label won.
  • Dashes in label names. For example purchase_count{product-id="123"}. As far as I can tell, dashes were never allowed in label names according to the spec, but the previous parser ate it up anyway.

If you are affected, you will see a log message that looks something like:

2025-08-13 12:48:01 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:71 in Error) | check:openmetrics | Error running check: [{"message":"Invalid labels: product-id="abc",","traceback":"Traceback (most recent call last):\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/prometheus_client/parser.py", line 82, in parse_labels\n raise ValueError("unquoted UTF-8 metric name")\nValueError: unquoted UTF-8 metric name\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/base.py", line 1317, in run\n self.check(instance)\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/base_check.py", line 141, in check\n self.process(scraper_config)\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 574, in process\n for metric in self.scrape_metrics(scraper_config):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 535, in scrape_metrics\n for metric in self.parse_metric_family(response, scraper_config):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 459, in parse_metric_family\n for metric in text_fd_to_metric_families(input_gen):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/libs/prometheus.py", line 16, in text_fd_to_metric_families\n for raw_line, metric_family in zip(raw_lines, parsed_lines):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/datadog_checks/base/checks/libs/prometheus.py", line 90, in _parse_payload\n sample = _parse_sample(line)\n ^^^^^^^^^^^^^^^^^^^\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/prometheus_client/parser.py", line 257, in _parse_sample\n labels = parse_labels(text[label_start + 1:label_end], False)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/datadog-agent/embedded/lib/python3.12/site-packages/prometheus_client/parser.py", line 112, in parse_labels\n raise ValueError("Invalid labels: " + labels_string)\nValueError: Invalid labels: product-id="abc",\n"}]

It's not just the invalid metric that will not be sent to Datadog, but all metrics contained within that response. So the consequence is that you will suddenly have no metrics after upgrading.

Best of luck in trying to figure out:

  1. Why it's invalid, as the original error is swallowed
  2. If you are running the agent in a multi-tenant environment, what is emitting the invalid metric, as neither the full metric nor where the response came form is included in the error log.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions