Skip to content

[Design] plugins shutdown sequence refactor proposal #3210

Closed

Description

We need to refactor the shutdown sequence involving the input plugins and establish a clear shutdown semantic.

Current Design

Currently the base plugin class exposes the following methods related to shutdown:

  • shutdown
    • only the lumberjack input plugin (of all input/filter/output/codec) call this method
    • never overridden
    • does nothing
  • finished
    • called from many plugins
    • never overridden
    • does nothing
  • teardown
    • called from many plugins
    • overridden by many plugins for the plugin shutdown bookkeeping
  • finished?
    • only the rackspace input plugin uses it
  • running?
    • only the sqs input uses it
  • terminating?
    • a hand full of plugins uses it

Current input plugin shutdown sequence description:
There are basically 3 situations:

1- all input plugins have exited normally, they have somehow reached the end of their input:

  • the plugin will exit its run method, the plugin thread will exit
  • the pipeline thread will call the plugin teardown method

2- a SIGINT or SIGTERM is received and the pipeline needs to terminate the input plugins and the pipeline:

  • a) SIG* signal is trapped in the pipeline thread
  • b) Pipeline#shutdown is called from the signal handler:
    • all input plugins threads are injected an exception + wakeup like this:

      thread.raise(LogStash::ShutdownSignal)
      thread.wakeup

    • all input plugins teardown methods are called in sequence

    • (b) normally the plugins exit their run method and threads at which point the pipeline thread will again call the input teardown methods

3- an uncatched exception in the input plugin

  • the plugin teardown method is called from the pipeline thread
  • the plugin is “restarted” by calling the plugin run method again in a begin/retry

Problems

  • current semantic is not clear and shutdown handling is inconsistent throughout the plugins
  • injecting a rogue exception by doing a raise in the live thread is a very bad idea and can lead to many weird problems. see this post discussing the related problems
  • per the previous reason, a LogStash::ShutdownSignal exception can happen virtually anywhere, event in the teardown method
  • teardown is called twice on the input plugins in a SIGINT/SIGTERM situation
  • thread safety issues are not clearly defined/understood in the current design

Suggested Refactor

Here's my suggestion for a first iteration cleanup. Once cleaned up, we will be able to more easily build upon it and start thinking about decoupling further the plugin execution/lifecycle from the pipeline in further iterations.

We basically need 2 things for the input plugins:

1- signal the plugin it need to stop
2- ask the plugin to do its bookkeeping once it is stopped

For this I suggest that the plugins exposes only 3 methods: stop, stop? and close.

  • stop is called from outside the plugin thread (from the pipeline thread) and is to ask the plugin to stop. this method must be thread-safe bacause it will be called from outside the plugin thread.
  • stop? returns true if the stop method was called and can be used within the plugin to verify if a stop was requested.
  • close is called once when the plugin is stopped to perform any final bookkeeping.

The way to know a plugin is stopped is simply by waiting the return of the plugin run method. When the run method exits and the plugin thread exits, this is only when the close method should be called, once.

The same semantic can be applied to the filter and output plugins which would support 2 shutdown modes:

1- the current mode of inserting a SHUTDOWN event in the queue so that all inflight events will be processed before the filter and output plugins are closed
2- a new mode using the stop method for an immediate shutdown in which case the inflight events will be lost when not using queue persistence, in other words, with queue persistence, we will be able to do an immediate shutdown without loosing inflight events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions