Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic handling to ignore errors on startup #14803

Closed
powersj opened this issue Feb 13, 2024 · 5 comments · Fixed by #14884 or #15145
Closed

Generic handling to ignore errors on startup #14803

powersj opened this issue Feb 13, 2024 · 5 comments · Fixed by #14884 or #15145
Assignees
Labels
feature request Requests for new plugin and for new features to existing plugins

Comments

@powersj
Copy link
Contributor

powersj commented Feb 13, 2024

Use Case

In numerous feature requests, users are asking for a way to ignore errors on start up. It would be nice to have a high-level generic method to add this to input plugins.

Expected behavior

Agent level setting to set this behavior for all input plugins.

Actual behavior

Currently users will see Telegraf throw an error during start up if an input plugin cannot start. There are a couple exceptions to this where a configuration option is present to avoid these:

  ## Behavior when we fail to connect to the endpoint on initialization. Valid options are:
  ##     "error": throw an error and exits Telegraf
  ##     "ignore": ignore this plugin if errors are encountered
  # connect_fail_behavior = "error"

Additional info

No response

@powersj powersj added the feature request Requests for new plugin and for new features to existing plugins label Feb 13, 2024
@zak-pawel
Copy link
Collaborator

I'm not sure if it necessarily has to be connect.
Sometimes it can be a local file (regular or socket) exposed by some application/service that hasn't started yet. Such situations may occur, for example, right after the OS starts when Telegraf starts before another application/service.

@srebhan
Copy link
Member

srebhan commented Feb 22, 2024

We agreed on

  ## Behavior when starting the plugin fails e.g. due to connectivity issues. Valid options are:
  ##     "error": exit Telegraf with an error (current behavior)
  ##     "ignore": remove this plugin from further processing if errors are encountered
  ##     "retry": retry starting up the plugin in each gather or flush cycle
  # startup_error_behavior = "error"

@miken32
Copy link

miken32 commented Mar 20, 2024

Is it safe to assume the linked PR #14884 will apply to input plugins as well as output? I'm thinking specifically of #12959 here, where we're in the absurd situation of losing reporting for dozens of hosts if one of them goes offline.

This issue specifically mentions input plugins, but the PR only says output plugins, which is why I'm looking for clarification.

@srebhan
Copy link
Member

srebhan commented Mar 20, 2024

@miken32 the target is to provide the functionality for both inputs and outputs. The first PR will tackle the output side (more specifically the framework itself plus one example plugin) but additional PRs will eventually also implement startup-retry framework for input plugins.

Regarding the mentioned issue #12959: This will not be solved by the startup-retry framework alone as it seems like the auth-information is gathered only once in Init() which is wrong as this has to be in Start() to benefit from the framework. In any case, your hosts will still be missing as long as the authentication server is unreachable!

@srebhan
Copy link
Member

srebhan commented Mar 27, 2024

Reopening as we still need to do the input side...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
4 participants