Description
Is your feature request related to a problem? Please describe.
When a worker process died for some reason, Fluentd always tries
to restart the worker as soon as possible.
This "always-restart-immdeiately" policy does not always work fine.
For example, think the following cases:
-
The worker process was killed because the host OS is almost running
out of the available memory at the moment. -
The worker process was killed because the host OS tries to perform
operation but Fluentd locked the resource (e.g. file locks).
Describe the solution you'd like
Systemd provides RestartSec
option that allows to wait for a few
seconds before restarting a service to solve the same issue.
https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=
It would be better if Fluentd's supervisor can provide a similar option too.
Describe alternatives you've considered
NA
Additional context
The underlying serverengine has a feature called "delayed_start_worker".
The option described above can be iimplemented on it (we need to tweak the
serverengine as well, though)