-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pipeline.ordered setting for java execution #11524
Add support for pipeline.ordered setting for java execution #11524
Conversation
Some Ruby specs not passing because of the added warning logs - will fix that. |
logstash-core/src/main/java/org/logstash/execution/WorkerLoop.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/execution/WorkerLoop.java
Outdated
Show resolved
Hide resolved
Thanks @yaauie for the review. Per our inline discussion, my intent with making |
Re: defaulting to When I proposed a separate opt-in parameter (as opposed to relying on circumstances e.g. one pipeline worker), my reasoning was that maintaining strict ordrring comes with a significant penalty and should only ever be used in cases where a pipeline absolutely needs to rely on strictly-ordered execution. While this implementation allows someone to opt-out with |
@yaauie I hear what you say and agree on the principle; that's why I added the waning logs I guess we could also flip the messaging and if there is a single worker and One of the problem I see is that by not fixing the default behaviour we will have to change the documentation of the aggregate filter for example
And this is what we've been telling, to just use |
What if simply introduce a default behaviour that, if [edit] this would actually reduce the need for warnings. it does the right thing by default, and if the user is overriding this setting then it's quite obvious what the consequence is (ordered events vs speed) |
@jsvd note that |
Allowing someone to set What if we had a third setting value called "auto" (default), and it would work like this:
|
@jsvd yeah, I like that. I would suggest that in v7 it defaults to |
@jsvd furthermore, with you suggestion this option becomes generic for both Java and Ruby execution, no need to specify that this option is only for the Java execution. Obviously if |
The one caveat I'd add on @jsvd's "default:auto" solution, is it is possible for |
@yaauie we could log a warning in this case? |
If our setting is defined like: Setting::String.new('pipeline.ordered', 'auto', true, %w(true false auto)) Then we could do something like: def effective_pipeline_ordered_value
case settings.value('pipeline.ordered')
when 'true' then true
when 'false' then false
when 'auto'
settings.set?('pipeline.workers') && settings.value('pipeline.workers') == 1
else
fail 'illegal state'
end
end |
@yaauie yeah, I think it makes sense to apply the |
I have reorg'ed the validation as discussed. I also added the default option in |
/cc @karenzone for docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the docs! I made a few suggestions, but will leave them up to your discretion. Otherwise LGTM
Added java execution specs. |
Also added a Ruby pipeline spec. At this point I think this is code-complete (pending green build) and ready for final reviewing. |
I've done some performance tests. Config 1A relatively simple config with a few conditional branches.
Test
ResultsJava Execution
Ruby Execution
Config 2A slightly more taxing config with regex conditionals and a grok filter.
Test
ResultsJava Execution
Ruby Execution
|
A conclusion to the above performance tests is that the Ruby execution (which always preserve order) is actually faster than preserving order with the Java execution with this implementation. I would still move forward with this as-is because it is now "doing it right" and followup on it to improve performance since the Ruby engine will be deprecated in the future. |
|
So when Java pipeline compiler compiles the pipeline we could artificially put an aggregator "filter" after as joint point between last user filter and the output section that wait for all dripped events before moving on |
@andsel That's an idea. I wonder if it would not be better to look into specializing the In any case I think we should probably move this optimization into another issue; I suggest we merge this PR knowing it might not be optimal but correct in behaviour and then followup with potential optimizations? |
+1 to merging as-is and deferring follow-up optimization to a separate issue. |
This PR will be merged for v8 and v7.7. for 7.6 we have decided not to backport, at least not until we find a solution for the batch re-aggregation after the filters and before outputs (issue in #11550). Note that in the mean time the workaround is to use the Ruby execution |
reuse rubyArray for single element batches rename preserveBatchOrder to preserveEventOrder allow boolean and string values for the pipeline.ordered setting, reorg validation update docs yml typo Update docs/static/running-logstash-command-line.asciidoc Co-Authored-By: Karen Metts <35154725+karenzone@users.noreply.github.com> Update docs/static/running-logstash-command-line.asciidoc Co-Authored-By: Karen Metts <35154725+karenzone@users.noreply.github.com> java execution specs and spec support docs corrections per review typo close not shutdown Ruby pipeline spec
ee81f7a
to
ceab96f
Compare
Rebatching before outputs was solved in #11710 and will be available un 7.7.0. |
Fixes #10938
Followup from abandoned PRs #11020 and #11099
TODO:
This PR adds the
pipeline.ordered
option that when used under the Java execution and only whenpipeline.workers
is set to 1 will preserve the batched events order throughout the filters+output execution.Under this PR both the example in #11099 description and the aggregate filter scenario
#11099 (comment) are working correctly.
The Problem
The Java execution model is created from a graph representation of the configuration and under that model, each conditional evaluation is splitting the batch events is two, for the events evaluating on the
true
side and thefalse
side of the condition and then moving down the computation path on each side in turns. This means that all events matching one side of the condition will be computed before the ones on the other side.The Solution
Under that model, the only solution is to drip events from a given batch one-by-one down the filter+output computation so that the event ordering is always preserved.
This solution is obviously subobtimal and not very different from simply setting a single worker and a batch size of 1 element. The difference with this implementation is that a bigger batch size can still be specified and will be honoured at the input and PQ.
Possible Enhancement
compute
method implementation/generation to either accept an array of events as it is now and also accept a single event to avoid the overhead of dealing with arrays/multiple elements when we know there is just one.In both case I think these are optimizations/improvements that could be made as followups.