Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response_timeout deprecated option for Prometheus input still broken #15076

Closed
nick-kentik opened this issue Mar 28, 2024 · 3 comments · Fixed by #15078
Closed

response_timeout deprecated option for Prometheus input still broken #15076

nick-kentik opened this issue Mar 28, 2024 · 3 comments · Fixed by #15078
Labels
bug unexpected problem or unintended behavior

Comments

@nick-kentik
Copy link
Contributor

nick-kentik commented Mar 28, 2024

Relevant telegraf.conf

[inputs.prometheus]
urls = ["http://localhost:8192"]
response_timeout = "5s"

Logs from Telegraf

$ telegraf --version
Telegraf 1.30.0 (git: HEAD@3c03ddcf)
$ telegraf --config test-config.conf --test --debug
2024-03-28T11:26:00Z I! Loading config: test-config.conf
2024-03-28T11:26:00Z E! error loading config file test-config.conf: error parsing prometheus, line 3: field corresponding to `response_timeout' in prometheus.Prometheus cannot be set through TOML

System info

Telegraf 1.30.0, Linux AMD64 (Debian Bookworm)

Docker

N/a

Steps to reproduce

  1. Try to run Telegraf against a prometheus input containing the deprecated (but not removed) response_timeout option
  2. Observe failure
  3. ???
    ...

Expected behavior

The field is deprecated in favour of timeout, but not removed, so should continue to function.

Actual behavior

Telegraf fails to start

Additional info

Follow-up to #12859

I'm currently attempting an upgrade of Telegraf from the 1.21 series to latest across a fleet of hundreds of servers. Until this bug is fixed, there is no telegraf config that both supports a timeout for this input, and works across both versions, making the upgrade difficult to manage.

@nick-kentik nick-kentik added the bug unexpected problem or unintended behavior label Mar 28, 2024
@nick-kentik
Copy link
Contributor Author

nick-kentik commented Mar 28, 2024

OK @powersj, I think I've tracked down the problem here: the Prometheus type has two fields labeled with response_timeout. The toplevel ResponseTimeout, and the one in the embedded HTTPClientConfig -> httpconfig.HTTPClientConfig.ResponseTimeout

I guess the solution is to remove the former and use the latter? I'll see how it looks in a PR. Although then we'd lose the deprecation, so perhaps we instead have to stop embedding the latter and declare all its items at the toplevel?

powersj added a commit to powersj/telegraf that referenced this issue Mar 28, 2024
In influxdata#14153, the HTTP client config struct gained a response timeout
config option with TOML tags. This meant that there were two defined for
both Prometheus plugin and the HTTP client config struct. This removes
the one in Prometheus, which was used to set the one in the HTTP client
anyway.

fixes: influxdata#15076
@powersj
Copy link
Contributor

powersj commented Mar 28, 2024

Looks like v1.28.2 was the last working release. After that release #14153 landed which updated the HTTP Client config struct with a field also called response_timeout. Because the Prometheus plugin includes the HTTP client config struct there seems to be a conflict.

While I have put up #15078 with a potential solution, I need to chat with the team over the implications.

Thanks

@nick-kentik
Copy link
Contributor Author

No worries, thanks for looking into it. If 1.28.2 works, I can take us that far and update the config after to remove the deprecated stuff, so that's enough to unblock me personally.

Hazards of falling so far behind!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants