Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC][DISCUSS] preserve batched events order #11099

Closed

Conversation

colinsurprenant
Copy link
Contributor

@colinsurprenant colinsurprenant commented Aug 29, 2019

NOT FOR REVIEW, FOR DISCUSSION.

This is a quick and incomplete POC to demonstrate the idea of events ordering within batches as a possible solution for #10938 and #11062

I will not go over the problem description, see the above issues, but the idea of this solution here is:

  • First to assign a sequence number to the event as read from the queue. Currently this is done by adding an order field in the event with that sequence number, obviously this is just a POC and a proper sequence attribute should be implemented.
  • Then the OutputDelegator multiReceive method sorts the events based on that sequence number order field before passing them to the output plugin.

This implementation is incomplete and has a few caveats; only the memory queue assigns the sequence number, it does not take into account new events that are added by the filters.

I don't think solving the case of the new events would be too difficult.

With the following example:

input {
  generator {
    lines => [
      "line 1",
      "line a",
      "line 2",
      "line b",
      "line 3",
      "line c"
    ]
    count => 1
  }
}

filter {
  if [message] =~ "\d" {
    mutate { add_tag => "digit" }
  } else {
    mutate { add_tag => "letter" }
  }
}

output { stdout { codec => line } }

It preserves order correctly with this patch:

$ bin/logstash -w 1  -f tmp/ordering.conf
...
2019-08-29T21:03:44.603Z mbp15r.local line 1
2019-08-29T21:03:44.624Z mbp15r.local line a
2019-08-29T21:03:44.625Z mbp15r.local line 2
2019-08-29T21:03:44.625Z mbp15r.local line b
2019-08-29T21:03:44.626Z mbp15r.local line 3
2019-08-29T21:03:44.626Z mbp15r.local line c

It does not preserve order without the patch:

$ bin/logstash -w 1  -f tmp/ordering.conf
...
2019-08-29T21:06:44.554Z mbp15r.local line 1
2019-08-29T21:06:44.577Z mbp15r.local line 2
2019-08-29T21:06:44.578Z mbp15r.local line 3
2019-08-29T21:06:44.577Z mbp15r.local line a
2019-08-29T21:06:44.578Z mbp15r.local line b
2019-08-29T21:06:44.578Z mbp15r.local line c

WDYT?

@colinsurprenant
Copy link
Contributor Author

From a discussion with @danhermann he pointed out that this solution does not solve this aggregate filter scenario raised by @yaauie in https://gist.github.com/yaauie/82e199687c490a2e04fe1f0c9d9a7fee

What I understand in the above scenario, the order of each batch split created by the conditionals is not really important but this is more about the order in which each filter is called which should respect the configuration syntactic order and that order is probably not preserved by the java execution engine. This is a completely different problem.

@andsel
Copy link
Contributor

andsel commented Aug 30, 2019

The idea is nice. In case an event is splitted in more events, we could use a hierarchical ordering instead of natural (like the version number in software, so version of part of event 1 are versioned as 1.1 and 1.2 and so on. In this case I don't know if all the code changes could be isolated into the Java Execution's code.

The case of aggregation could be thought as "keep the minimum order id of all the events that concur to create the aggregated event". I suspect in this case that the code change interest the plugin, and this dirty the design because code related to the execution goes in the plugin.

@colinsurprenant
Copy link
Contributor Author

closing in favour of #11524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants