-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: significant rework of Event<->IngestDocument marshalling #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@roaksoax unfortunately this opens up a can-of-worms because ingest pipelines do intentionally set metadata fields that are used downstream for indexing (see failing integration specs), and we currently have no way to propagate them to our Elasticsearch output plugin, which must supply them separately from the event payload itself. I have considered adding a Example usage (would work IFF all five magic fields are set):
While we theoretically could pre-populate all five with values when they are not present, we have no way to tell Elasticsearch to do its "default" id generation or routing. |
src/main/java/co/elastic/logstash/filters/elasticintegration/IngestDuplexMarshaller.java
Outdated
Show resolved
Hide resolved
The ES-ingest reserved Metadata fields by definition _cannot_ be top-level fields of the events we send to Elasticsearch and therefore should not be included in their reserved place in the Event that we generate from an IngestDocument. We already safely extract the `_version` from the IngestDocument's metadata and safely map it to our LS-reserved `@version`, but there is reason to believe that _any_ of these fields could be directly useful to a Logstash pipeline that will eventually persist the events to Elasticsearch. Here, we make all of these fields, sans their underscore prefix, available in the metadata-key `[@metadata][_ingest_document]`, which prevents them from clashing or being included in output accidentally but makes them available to the downstream LS pipeline if the author wishes to access them.
9e048ec
to
48d5a6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except doc N/A
to _index
map change, rest of them LGTM!
docs/index.asciidoc
Outdated
@@ -178,9 +178,27 @@ Changes to the IngestDocument will be reflected in the resulting Logstash Event, | |||
|
|||
| `tags` | `tags` | when ingest processing produces a top-level `tags` field that is not a collection of strings, it will be made available via the event's `_tags` field. | |||
|
|||
| _N/A_ | `_index`, ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _N/A_ | `_index`, ` |
I got confused with this change. Can you please explain what it is? Basically, how non-available field can be mapped to _index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was accidental and will be removed. Oops.
Fundamentally changes the model for marshalling `Event`s to IngestDocument`s and back again to solve three inter-related issues: Resolves: #47 ------------- The `IngestDocument` metadata fields are no longer added to the top-level of the `Event`, and are now separately routed to `[@metadata][_ingest_document]`, fixing an issue where the presence of Elasticsearch-reserved fields such as the top-level `_version` would cause a downstream Elasticsearch output to be unable to index the event Resolves: #54 ------------- The top-level `@timestamp` and `@version` fields are no longer excluded from the `IngestDocument`, as required by some existing integration pipelines Partially-Resolves: #65 ------------- Field-type conversions have been improved by adding a two-way-mapping between the Logstash-internal `Timestamp`-type object and the equivalent `ZonedDateTime`-object used in several Ingest Common processors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I greatly appreciate this work @yaauie, thanks so much !
This changes resolve deep-core issues of plugin-event to IngestDocument transfer safety.
I have tested with major issues with some integrations such as Apache, Nginx and LGTM!
We have to land ASAP this change and release the new 0.0.2
version.
Merging now, will land a new release of the plugin on Monday. |
This PR Fundamentally changes the model for marshalling
Event
s to IngestDocument`sand back again to solve three inter-related issues:
Resolves: #47
The
IngestDocument
metadata fields are no longer added to the top-level ofthe
Event
, and are now separately routed to[@metadata][_ingest_document]
,fixing an issue where the presence of Elasticsearch-reserved fields such as
the top-level
_version
would cause a downstream Elasticsearch output to beunable to index the event
Resolves: #54
The top-level
@timestamp
and@version
fields are no longer excluded fromthe
IngestDocument
, as required by some existing integration pipelinesPartially-Resolves: #65
Field-type conversions have been improved by adding a two-way-mapping between
the Logstash-internal
Timestamp
-type object and the equivalentZonedDateTime
-object used in several Ingest Common processorsReview Note: unfortunately, these three issues became pretty tangled, so this code is best reviewed by looking at the result of the changes to the IngestDuplexMarshaller and not the diff