Skip to content

fix: significant rework of Event<->IngestDocument marshalling #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
## 0.0.2 (UNRELEASED)
- Adds proactive reloaders for both datastream-to-pipeline-name mappings and pipeline definitions to ensure upstream changes are made available without impacting processing [#48](https://github.com/elastic/logstash-filter-elastic_integration/pull/48)
- Presents helpful guidance when run on an unsupported version of Java [#43](https://github.com/elastic/logstash-filter-elastic_integration/pull/43)
- Fixes several related issues with how fields are mapped from the Logstash Event to the IngestDocument and back again [#51](https://github.com/elastic/logstash-filter-elastic_integration/pull/51)
- `IngestDocument` metadata fields are now separately routed to `[@metadata][_ingest_document]` on the resulting `Event`, fixing an issue where the presence of Elasticsearch-reserved fields such as the top-level `_version` would cause a downstream Elasticsearch output to be unable to index the event [#47][]
- Top-level `@timestamp` and `@version` fields are no longer excluded from the `IngestDocument`, as required by some existing integration pipelines [#54][]
- Field-type conversions have been improved by adding a two-way-mapping between the Logstash-internal `Timestamp`-type object and the equivalent `ZonedDateTime`-object used in several Ingest Common processors [#65][]
- Adds proactive reloaders for both datastream-to-pipeline-name mappings and pipeline definitions to ensure upstream changes are made available without impacting processing [#48](https://github.com/elastic/logstash-filter-elastic_integration/pull/48)
- Presents helpful guidance when run on an unsupported version of Java [#43](https://github.com/elastic/logstash-filter-elastic_integration/pull/43)

[#47]: https://github.com/elastic/logstash-filter-elastic_integration/issues/47
[#54]: https://github.com/elastic/logstash-filter-elastic_integration/issues/54
[#65]: https://github.com/elastic/logstash-filter-elastic_integration/issues/65

## 0.0.1
- Empty Bootstrap of Logstash filter plugin [#1](https://github.com/logstash-plugins/logstash-filter-elastic_integration/pull/1)
Expand Down
2 changes: 2 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,8 @@ tasks.withType(Test) {
"--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED",
"--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED",
"--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED",
"--add-opens=java.base/sun.nio.ch=ALL-UNNAMED",
"--add-opens=java.base/java.io=ALL-UNNAMED",
"--add-opens=java.base/java.lang=ALL-UNNAMED",
"--add-opens=java.base/java.util=ALL-UNNAMED"
]
Expand Down
50 changes: 41 additions & 9 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -162,24 +162,56 @@ h| GeoIp
[id="plugins-{type}s-{plugin}-field_mappings"]
===== Field Mappings

During execution, the Ingest pipeline works with a temporary mutable _view_ of the Logstash event that re-shapes some {ls}-reserved fields into their expected IngestDocument field names and object-types.
Changes to the IngestDocument will be reflected in the resulting Logstash Event, including safely mapping these reserved fields _back_ from the IngestDocument reserved field to the {ls} reserved field counterpart.
:esid: {es} Ingest Document

[cols="<,<,<",options="header"]
During execution the Ingest pipeline works with a temporary mutable _view_ of the Logstash event called an ingest document.
This view contains all of the as-structured fields from the event with minimal type conversions.

It also contains additional metadata fields as required by ingest pipeline processors:

* `_version`: a `long`-value integer equivalent to the event's `@version`, or a sensible default value of `1`.
* `_ingest.timestamp`: a `ZonedDateTime` equivalent to the event's `@timestamp` field

After execution completes the event is sanitized to ensure that Logstash-reserved fields have the expected shape, providing sensible defaults for any missing required fields.
When an ingest pipeline has set a reserved field to a value that cannot be coerced, the value is made available in an alternate location on the event as described below.

[cols="<1,<1,<5",options="header"]
|=======================================================================
| {ls} field | type | value

| {ls} Field | IngestDocument Field | Conflict Handling
| `@timestamp` | `Timestamp` |
First coercible value of the ingest document's `@timestamp`, `event.created`, `_ingest.timestamp`, or `_now` fields; or the current timestamp.
When the ingest document has a value for `@timestamp` that cannot be coerced, it will be available in the event's `_@timestamp` field.

| `@timestamp` | `_ingest.timestamp` |when ingest processing _also_ sets a top-level `@timestamp` field, it will be made available via the Event's `_@timestamp` field
| `@version` | String-encoded integer |
First coercible value of the ingest document's `@version`, or `_version` fields; or the current timestamp.
When the ingest document has a value for `@version` that cannot be coerced, it will be available in the event's `_@version` field.

| `@version` | `_version` | when ingest processing _also_ sets a top-level `@version` field in the source, it will be made available via the Event's `_@version` field
| `@metadata` | key/value map |
The ingest document's `@metadata`; or an empty map.
When the ingest document has a value for `@metadata` that cannot be coerced, it will be available in the event's `_@metadata` field.

| `@metadata` | `@metadata` | when ingest processing replaces the top-level `@metadata` map with an object that is not a string-object map, it will be made available via the Event's `_@metadata` field.
| `tags` | a String or a list of Strings |
The ingest document's `tags`.
When the ingest document has a value for `tags` that cannot be coerced, it will be available in the event's `_tags` field.
|=======================================================================

Additionally, the following Elasticsearch IngestDocument Metadata fields are made available on the resulting event _if-and-only-if_ they were set during pipeline execution:

| `tags` | `tags` | when ingest processing produces a top-level `tags` field that is not a collection of strings, it will be made available via the event's `_tags` field.
:mcc-prefix: [@metadata][_ingest_document]

| _everything else_ | _in-place, as-structured_ | only minimal type conversions are done
[cols="<1,<5",options="header"]
|=======================================================================
| {es} document metadata | {ls} field

| `_id` | `{mcc-prefix}[id]`
| `_index` | `{mcc-prefix}[index]`
| `_routing` | `{mcc-prefix}[routing]`
| `_version` | `{mcc-prefix}[version]`
| `_version_type` | `{mcc-prefix}[version_type]`
| `_ingest.timestamp` | `{mcc-prefix}[timestamp]`
|=======================================================================


[id="plugins-{type}s-{plugin}-resolving"]
==== Resolving Pipeline Definitions
Expand Down
4 changes: 2 additions & 2 deletions spec/integration/elastic_integration_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@
"data_stream" => data_stream)]

subject.multi_filter(events).each do |event|
expect(event.get("_index")).to include("<monthly-index-{2023-03-08")
expect(event.get("[@metadata][_ingest_document][index]")).to start_with("<monthly-index-{2023-03-08")
expect(event.get("[@metadata][target_ingest_pipeline]")).to eql '_none'
end
end
Expand Down Expand Up @@ -726,7 +726,7 @@
"data_stream" => data_stream)]

subject.multi_filter(events).each do |event|
expect(event.get("_index")).to eql "uz-catalog"
expect(event.get("[@metadata][_ingest_document][index]")).to eql "uz-catalog"
expect(event.get("[@metadata][target_ingest_pipeline]")).to eql '_none'
end
end
Expand Down
Loading