Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebatch after filters when using pipeline.ordered #11710

Merged
merged 1 commit into from
Mar 27, 2020

Conversation

colinsurprenant
Copy link
Contributor

@colinsurprenant colinsurprenant commented Mar 20, 2020

Fixes #11550
Relates to #11175

WIP TODOs

  • major cleanup
  • refactor & simplify the filtersExecution + lastFilter logic
  • re-enable and fix relevant java tests
  • benchmark

Description

When using pipeline.ordered the batch events are sent 1 by 1 down the filters and output. This PR rebatches after the filters so that full batches can be sent to the outputs. It does that by separating the filters and output execution with the introduction of a separator vertex in the pipeline graph between the filters and outputs and compiling both sections separately and in the worker_loop calling both compute separately which provides the possibility to rebatch after the filters.

@colinsurprenant colinsurprenant changed the title [POC][WIP] rebatch after filters when using pipeline.ordered [WIP] rebatch after filters when using pipeline.ordered Mar 23, 2020
@colinsurprenant colinsurprenant changed the title [WIP] rebatch after filters when using pipeline.ordered Rebatch after filters when using pipeline.ordered Mar 23, 2020
@colinsurprenant
Copy link
Contributor Author

I performed the same benchmarks as in #11524 (comment) (but on different hardware, but it's not really important since relative numbers are interesting here). Also, I noticed variations of ~5-10% between runs with the same parameters which I attribute to CPU throttling.

Config1

Java Execution

pipeline.ordered TPS master TPS PR
false 158k/s 156k/s
true 80k/s 155k/s

Ruby Execution

pipeline.ordered TPS master TPS PR
false/true 152k/s 157k/s

Config2

Java Execution

pipeline.ordered TPS master TPS PR
false 70k/s 75k/s
true 38k/s 56k/s

Ruby Execution

pipeline.ordered TPS master TPS PR
false/true 60k/s 63k/s

@colinsurprenant colinsurprenant requested a review from yaauie March 23, 2020 22:32
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review: looks like we are headed in the right direction. The intermediate vertex is a clever solution that allows us to keep the graph validation logic, which wouldn't be possible if we ended up with two graphs.

I've left a number of comments and nitpicks, more discussion points than hard requests. The refactors you've brought in give us some opportunity to go a small step further in a few places to provide additional clarity and clearer abstractions.

Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @colinsurprenant I've added some nits (probably almost all on personal taste) and a note on a possible recursion (IDEA doesn't show me as it, and I've not tried locally, but I suspect it)

Do you plan to unit ad tests, or the existing give us good confidence?

@colinsurprenant
Copy link
Contributor Author

Thanks a lot for the reviews @yaauie @andsel for the reviews! I pushed some suggested modifications, will be working on others today. I'd also like your thoughts about the SeparatorVertex comments (keep as-is with an explicit ID or refactor into one more abstract class). LMKWYT.

@colinsurprenant
Copy link
Contributor Author

@andsel @yaauie I think we are ready for a another round of review - as per my comments

  • I suggest we leave the SeparatorVertex implementation as-is here and followup with a broader Vertex refactor which will include the QueueVertex.
  • Also defer a AckedReadBatch refactor. There is some cleanups and simplifications to be done there but is out of scope here.

Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good, and concentrates the complexity in appropriate places.

I've added a couple nitpicks, praises, and one suggestion to squash a build warning that fails the build.

@andsel
Copy link
Contributor

andsel commented Mar 27, 2020

Only a really minor comment, LGTM

Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the deferrals noted above.

@colinsurprenant
Copy link
Contributor Author

Thanks a lot @yaauie @andsel for the excellent reviews!

@colinsurprenant
Copy link
Contributor Author

Merged in master (8.0), 7.x #11726 and 7.7 #11730

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve pipeline.ordered setting to rebatch before outputs
3 participants