Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] deprecate/remove dedup config #33773

Closed
axw opened this issue Jun 26, 2024 · 1 comment · Fixed by #33776
Closed

[exporter/elasticsearch] deprecate/remove dedup config #33773

axw opened this issue Jun 26, 2024 · 1 comment · Fixed by #33776
Labels

Comments

@axw
Copy link
Contributor

axw commented Jun 26, 2024

Component(s)

exporter/elasticsearch

Describe the issue you're reporting

The dedup config was introduced in the initial implementation of the Elasticsearch exporter for de-duplicating colliding attributes. This is particularly relevant when using the "Raw" or "ECS encoding modes, where attributes are not nested in unique namespaces. e.g. if one were to add a trace.id attribute to a log record, while also setting the top-level TraceID field, then in ECS mode this would lead to duplicate trace.id attributes being set in the Elasticsearch document; enabling the dedup configuration prevents this.

For better or worse, Elasticsearch rejects documents with objects that have duplicate keys: elastic/elasticsearch#19614. This is not configurable.

The only conceivable reason for making deduplication configurable in the exporter is when you know for absolutely sure that duplicates cannot occur, so you can save a few CPU cycles. This would be the exception rather than the norm, and I think the value does not justify the complexity the configuration introduces.

I propose we:

  • deprecate and then remove the dedup config
  • always deduplicate object keys

I don't think there's any harm in doing these out of order, essentially making the config a no-op.

@axw axw added the needs triage New item requiring triage label Jun 26, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel pushed a commit that referenced this issue Jun 28, 2024
**Description:**

Deprecate `exporter/elasticsearch`'s "dedup" configuration. In a future
release we will remove this configuration, and always deduplicate
wherever necessary.

**Link to tracking Issue:**


#33773

**Testing:**

N/A

**Documentation:**

Updated the README.
andrzej-stencel added a commit that referenced this issue Jul 16, 2024
**Description:**

Remove the `dedup` configuration setting, and always de-duplicate.
Elasticsearch does not permit duplicate keys in JSON objects, and this
configuration is adding more complexity to the code than it's worth.

I've simplified the `internal/objmodel` API slightly, unexporting the
`Sort` methods, which are internally called by the now unconditional
call to `Dedup`.

**Link to tracking Issue:**

Closes
#33773

**Testing:**

Ran the unit tests, which cover deduplication. None of the tests in
package elasticsearchexporter covered `dedup: false`.

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Carson Ip <carsonip@users.noreply.github.com>
Co-authored-by: Andrzej Stencel <andrzej.stencel@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant