Skip to content

Add a new system option "delay_start_worker" #3749

Closed
@fujimotos

Description

@fujimotos

Is your feature request related to a problem? Please describe.

When a worker process died for some reason, Fluentd always tries
to restart the worker as soon as possible.

This "always-restart-immdeiately" policy does not always work fine.
For example, think the following cases:

  • The worker process was killed because the host OS is almost running
    out of the available memory at the moment.

  • The worker process was killed because the host OS tries to perform
    operation but Fluentd locked the resource (e.g. file locks).

Describe the solution you'd like

Systemd provides RestartSec option that allows to wait for a few
seconds before restarting a service to solve the same issue.

https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=

It would be better if Fluentd's supervisor can provide a similar option too.

Describe alternatives you've considered

NA

Additional context

The underlying serverengine has a feature called "delayed_start_worker".
The option described above can be iimplemented on it (we need to tweak the
serverengine as well, though)

https://github.com/treasure-data/serverengine/blob/master/lib/serverengine/multi_worker_server.rb#L132-L144

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions