-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[Design] plugins shutdown sequence refactor proposal #3210
Description
We need to refactor the shutdown sequence involving the input plugins and establish a clear shutdown semantic.
Current Design
Currently the base plugin class exposes the following methods related to shutdown:
shutdown- only the lumberjack input plugin (of all input/filter/output/codec) call this method
- never overridden
- does nothing
finished- called from many plugins
- never overridden
- does nothing
teardown- called from many plugins
- overridden by many plugins for the plugin shutdown bookkeeping
finished?- only the rackspace input plugin uses it
running?- only the sqs input uses it
terminating?- a hand full of plugins uses it
Current input plugin shutdown sequence description:
There are basically 3 situations:
1- all input plugins have exited normally, they have somehow reached the end of their input:
- the plugin will exit its
runmethod, the plugin thread will exit - the pipeline thread will call the plugin
teardownmethod
2- a SIGINT or SIGTERM is received and the pipeline needs to terminate the input plugins and the pipeline:
- a) SIG* signal is trapped in the pipeline thread
- b)
Pipeline#shutdownis called from the signal handler:-
all input plugins threads are injected an exception + wakeup like this:
thread.raise(LogStash::ShutdownSignal)
thread.wakeup -
all input plugins
teardownmethods are called in sequence -
(b) normally the plugins exit their
runmethod and threads at which point the pipeline thread will again call the inputteardownmethods
-
3- an uncatched exception in the input plugin
- the plugin
teardownmethod is called from the pipeline thread - the plugin is “restarted” by calling the plugin
runmethod again in abegin/retry
Problems
- current semantic is not clear and shutdown handling is inconsistent throughout the plugins
- injecting a rogue exception by doing a raise in the live thread is a very bad idea and can lead to many weird problems. see this post discussing the related problems
- per the previous reason, a
LogStash::ShutdownSignalexception can happen virtually anywhere, event in theteardownmethod teardownis called twice on the input plugins in aSIGINT/SIGTERMsituation- thread safety issues are not clearly defined/understood in the current design
Suggested Refactor
Here's my suggestion for a first iteration cleanup. Once cleaned up, we will be able to more easily build upon it and start thinking about decoupling further the plugin execution/lifecycle from the pipeline in further iterations.
We basically need 2 things for the input plugins:
1- signal the plugin it need to stop
2- ask the plugin to do its bookkeeping once it is stopped
For this I suggest that the plugins exposes only 3 methods: stop, stop? and close.
stopis called from outside the plugin thread (from the pipeline thread) and is to ask the plugin to stop. this method must be thread-safe bacause it will be called from outside the plugin thread.stop?returns true if the stop method was called and can be used within the plugin to verify if a stop was requested.closeis called once when the plugin is stopped to perform any final bookkeeping.
The way to know a plugin is stopped is simply by waiting the return of the plugin run method. When the run method exits and the plugin thread exits, this is only when the close method should be called, once.
The same semantic can be applied to the filter and output plugins which would support 2 shutdown modes:
1- the current mode of inserting a SHUTDOWN event in the queue so that all inflight events will be processed before the filter and output plugins are closed
2- a new mode using the stop method for an immediate shutdown in which case the inflight events will be lost when not using queue persistence, in other words, with queue persistence, we will be able to do an immediate shutdown without loosing inflight events.