Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[outputs.syslog] Not available endpoint prevents Telegraf to start (restart loop) #15770

Closed
1tft opened this issue Aug 23, 2024 · 4 comments · Fixed by #15787
Closed

[outputs.syslog] Not available endpoint prevents Telegraf to start (restart loop) #15770

1tft opened this issue Aug 23, 2024 · 4 comments · Fixed by #15787
Assignees
Labels
bug unexpected problem or unintended behavior

Comments

@1tft
Copy link

1tft commented Aug 23, 2024

Relevant telegraf.conf

[[outputs.syslog]]
  address = "tcp://doesenot.exist:9999" # use invalid address where you cant connect

Logs from Telegraf

2024-08-16T07:58:23Z E! [agent] Failed to connect to [outputs.syslog], retrying in 15s, error was "dial tcp4: lookup doesenot.exist:9999 on 127.0.0.1:53: no such host"
2024-08-16T07:58:38Z E! [telegraf] Error running agent: connecting output outputs.syslog: error connecting to output "outputs.syslog": dial tcp4: lookup doesenot.exist:9999 on 127.0.0.1:53: no such host
2024-08-16T07:58:39Z I! Starting Telegraf 1.30.3 brought to you by InfluxData the makers of InfluxDB
2024-08-16T07:58:39Z I! Available plugins: 233 inputs, 9 aggregators, 31 processors, 24 parsers, 60 outputs, 6 secret-stores
2024-08-16T07:58:39Z I! Loaded inputs: internal (2x) kernel kernel_vmstat mem mock
2024-08-16T07:58:39Z I! Loaded aggregators: minmax
2024-08-16T07:58:39Z I! Loaded processors: clone converter (3x) date defaults regex (4x) rename (3x) topk
2024-08-16T07:58:39Z I! Loaded secretstores:
2024-08-16T07:58:39Z I! Loaded outputs: influxdb (5x) syslog
2024-08-16T07:58:39Z I! [agent] Config: Interval:30s, Quiet:false, Hostname:"XXX", Flush Interval:25s
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] tags: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] tags: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] tags: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] tags: Using explicit mode...
2024-08-16T07:58:39Z I! [processors.regex] fields: Using explicit mode...
2024-08-16T07:58:39Z E! [agent] Failed to connect to [outputs.syslog], retrying in 15s, error was "dial tcp4: lookup doesenot.exist:9999 on 127.0.0.1:53: no such host"

System info

Telegraf 1.30.3

Docker

No response

Steps to reproduce

Use inside you Telegraf config [[outputs.syslog]]. Here define any invalid address which can not be connected. Telegraf will not start successfully (restart loop).
FYI: When Telegraf is running and syslog endpoint goes down, Telegraf is still working fine.

Expected behavior

Start normally and log this error.

Actual behavior

Does not start.

Additional info

We think its the same issue which we found in 2022 with [[outputs.graylog]]. This output plugin produced same restart issue, when endpoint (server) was not available during Telegraf start.
This issue has been fixed via #11950

@1tft 1tft added the bug unexpected problem or unintended behavior label Aug 23, 2024
@srebhan
Copy link
Member

srebhan commented Aug 28, 2024

@1tft please test the binary in PR #15787, available as soon as CI finished the tests, and let me know if this fixes the issue. You need to set e.g. startup_error_behavior = "ignore" to ignore the error for the plugin instance...

@1tft
Copy link
Author

1tft commented Aug 28, 2024

@srebhan Thank you very much for this PR! A short test was successful. I will test a little more tomorrow.
When you dont hear sth. from me till end of tomorrow, everything is OK.

I only noticed a different self logging behaviour between outputs (using loglevel debug = false): Using TCP based [[outputs.graylog]] produces at startup "[outputs.graylog] Connected!", using TCP based [[outputs.syslog]] produces not such "OK connected message". But this makes no issues on our side...

@srebhan
Copy link
Member

srebhan commented Aug 29, 2024

Do you want me to add a "Connected" message?

@srebhan srebhan self-assigned this Aug 29, 2024
@1tft
Copy link
Author

1tft commented Aug 29, 2024

We dont need such a OK message, we filter/alert on errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants