Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop inputs upon a worker error before terminating the pipeline #12307

Merged
merged 1 commit into from
Oct 13, 2020

Conversation

colinsurprenant
Copy link
Contributor

@colinsurprenant colinsurprenant commented Oct 5, 2020

What does this PR do?

This addresses an incomplete fix in #12019 starting in 7.8.1 where upon catching a worker exception (to avoid crashing the whole logstash per #12306) the input plugin(s) are not terminated prior to closing the pipeline leading to the input plugin(s) continuing execution and failing with IllegalStateException & Tried to write to a closed queue since closing the pipeline also correctly closes the queue.

Why is it important/What is the impact to the user?

This is a critical bug affecting users if logstash experiences a worker error.

Checklist

  • Add tests
  • Still be an issue when using memory queue where Queue.push does not return upon closing the input

How to test this PR locally

To reproduce the problem prior to this PR:

  • setup 2 pipelines, one that works and another that uses a local output or filter that raises an exception from its filter or receive method
  • run logstash; prior to this PR, IllegalStateException & Tried to write to a closed queue errors will be produced, with this PR, the original exception is reported and the pipeline is correctly terminated.

Related issues

@colinsurprenant
Copy link
Contributor Author

colinsurprenant commented Oct 6, 2020

Still be an issue when using memory queue where Queue.push does not return upon closing the input

I figured the problem: this is specifically when using a single worker (or if all workers die); if that single worker dies then if the input is in a Queue.push call and the queue is a memory queue then Queue.push will never return because there is no consumer on the other end of this ArrayBlockingQueue.

IMO there are two solutions to this:

  1. We use ArrayBlockingQueue.offer which can take a timeout argument and we add logic the have a spinning push that check if the plugin is being closed and keeps on offering until it is taken or the plugin is closed.
  2. In the monitor_inputs_and_workers method upon closing the input plugins we first check if there are any workers remaining to drain the queue and if there are no workers left we drain the queue by dropping events until all input plugins are closed.

Solution 2 seems better since it gives a chance for remaining workers to drain the queue and IIF there are no workers left we drop the event each input plugin was trying to push.

@colinsurprenant
Copy link
Contributor Author

Chose solution 2.

@colinsurprenant colinsurprenant marked this pull request as ready for review October 7, 2020 00:56
@andsel
Copy link
Contributor

andsel commented Oct 7, 2020

@colinsurprenant I've tried with http filter to invad url and ruby filter to raise an Exception to scatter the error at pipeline level, but those filters catches the error and tag the vent. Please, do you have an example of a possible breaking filter to share?

@colinsurprenant
Copy link
Contributor Author

@andsel you can point any plugin to a local copy by editing Gemfile and adding the :path option like this for example

gem "logstash-output-stdout", :path => "/path/to/local/repo/logstash-plugins/logstash-output-stdout"

and then re-bundling with

bin/ruby -S bundle

and then you can inject a raise in that plugin.

I have a few dummy plugins around for doing such tests.

@andsel andsel self-requested a review October 7, 2020 15:42
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally with bad behaving filter, and it's fixed.
LGTM

@colinsurprenant
Copy link
Contributor Author

Since last LGTM I added specs and exposed the @input_threads collection in the pipeline. I think this is worth a seconds review before merging.

@elasticsearch-bot elasticsearch-bot self-assigned this Oct 8, 2020
@colinsurprenant
Copy link
Contributor Author

For the record, the added specs were first created against master to make sure they were not passing.

@andsel andsel self-requested a review October 8, 2020 17:00
@andsel andsel assigned andsel and unassigned elasticsearch-bot Oct 8, 2020
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't count the cause of this issue as a regression, but rather an incomplete fix. Prior to the referenced code-change a crashing worker always crashed the whole Logstash process, and after it only sometimes caused its pipeline to crash.

That said, the code changes here look good to me, and catch the new edge-case. Thank you for following up with specs.

@colinsurprenant
Copy link
Contributor Author

@yaauie agreed on incomplete fix vs regression; I rephrased the description.

Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colinsurprenant
Copy link
Contributor Author

Thanks @andsel @yaauie - will squash & merge and backport in 7.9 and 7.10.

This addresses an incomplete fix in elastic#12019 starting in 7.8.1 where upon catching a worker exception (to avoid crashing the whole logstash per elastic#12306) the input plugin(s) are not terminated prior to closing the pipeline leading to the input plugin(s) continuing execution and failing with IllegalStateException & Tried to write to a closed queue since closing the pipeline also correctly closes the queue.
@colinsurprenant
Copy link
Contributor Author

merged in master and backported for 7.9.3, 7.10.0, 7.11.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tried to write to a closed queue error
4 participants