[Design] plugins shutdown sequence refactor proposal

We need to refactor the shutdown sequence involving the input plugins and establish a clear shutdown semantic. 
### Current Design

Currently the base plugin class exposes the following methods related to shutdown:
- `shutdown`
  - only the lumberjack input plugin (of all input/filter/output/codec) call this method
  - never overridden
  - does _nothing_
- `finished`
  - called from many plugins
  - never overridden
  - does _nothing_
- `teardown`
  - called from many plugins
  - overridden by many plugins for the plugin shutdown bookkeeping
- `finished?`
  - only the rackspace input plugin uses it
- `running?`
  - only the sqs input uses it
- `terminating?`
  - a hand full of plugins uses it

Current input plugin shutdown sequence description:
There are  basically 3 situations:

1- all input plugins have exited normally, they have somehow reached the end of their input:
- the plugin will exit its `run` method, the plugin thread will exit
- the pipeline thread will call the plugin `teardown` method

2- a `SIGINT` or `SIGTERM` is received and the pipeline needs to terminate the input plugins and the pipeline:
- a) SIG\* signal is trapped in the pipeline thread
- b) `Pipeline#shutdown` is called from the signal handler:
  - all input plugins threads are injected an exception + wakeup like this:
    
    `thread.raise(LogStash::ShutdownSignal)`
    `thread.wakeup`
  - all input plugins `teardown` methods are called in sequence 
  - (b) normally the plugins exit their `run` method and threads at which point the pipeline thread will **again** call the input `teardown` methods

3- an uncatched exception in the input plugin
- the plugin `teardown` method is called from the pipeline thread
- the plugin is “restarted” by calling the plugin `run` method again in a `begin/retry`
### Problems
- current semantic is not clear and shutdown handling is inconsistent throughout the plugins
- injecting a rogue exception by doing a raise in the live thread is a very bad idea and can lead to many weird problems. see [this post discussing the related problems](http://blog.headius.com/2008/02/ruby-threadraise-threadkill-timeoutrb.html)
- per the previous reason, a `LogStash::ShutdownSignal` exception can happen virtually anywhere, event in the `teardown` method
- `teardown` is called twice on the input plugins in a `SIGINT`/`SIGTERM` situation
- thread safety issues are not clearly defined/understood in the current design
### Suggested Refactor

Here's my suggestion for a first iteration cleanup. Once cleaned up, we will be able to more easily build upon it and start thinking about decoupling further the plugin execution/lifecycle from the pipeline in further iterations.

We basically need 2 things for the input plugins:

1- signal the plugin it need to stop
2- ask the plugin to do its bookkeeping once it is stopped

For this I suggest that the plugins exposes only 3 methods: `stop`, `stop?` and `close`. 
- `stop` is called from  outside the plugin thread (from the pipeline thread) and is to ask the plugin to stop. this method must be thread-safe bacause it will be called from outside the plugin thread.
- `stop?` returns true if the stop method was called and can be used within the plugin to verify if a stop was requested.
- `close` is called once when the plugin is stopped to perform any final bookkeeping.

The way to know a plugin is stopped is simply by waiting the return of the plugin `run` method. When the run method exits and the plugin thread exits, this is only when the `close` method should be called, once. 

The same semantic can be applied to the filter and output plugins which would support 2 shutdown modes:

1- the current mode of inserting a `SHUTDOWN` event in the queue so that all inflight events will be processed before the filter and output plugins are closed
2- a new mode using the `stop` method for an immediate shutdown in which case the inflight events will be lost **when not using queue persistence**, in other words, with queue persistence, we will be able to do an immediate shutdown without loosing inflight events.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Design] plugins shutdown sequence refactor proposal #3210

Current Design

Problems

Suggested Refactor

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Design] plugins shutdown sequence refactor proposal #3210

Description

Current Design

Problems

Suggested Refactor

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions