Duplicated logs when fingerprint_size is set to larger value #22936

kasia-kujawa · 2023-05-30T08:55:07Z

Component(s)

receiver/filelog

What happened?

Description

Filelog receiver constantly reads the same log file when fingerprint_size is set to value larger than size of scanner default buffer and size of the log file is lower than fingerprint_size but larger than scanner buffer.

Steps to Reproduce

Run otelcontribcol with attached configuration and read example log file

Expected Result

Logs are not duplicated

Actual Result

Logs are duplicated, the same logs are constantly read

Collector version

commit: ca50a98fdda4c362d8782a29f6fc5cc27977f37b

Environment information

Environment

Darwin Kernel Version 22.3.0
go version go1.19.6 darwin/amd64

OpenTelemetry Collector configuration

exporters:
  file:
    path: out.log
receivers:
  filelog:
    fingerprint_size: 17408
    start_at: beginning
    include:
    - generated_log.txt
service:
  pipelines:
    logs:
      exporters:
      - file
      receivers:
      - filelog

Log output

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-05-30T08:55:27Z

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

kasia-kujawa · 2023-05-30T09:02:20Z

The issue is not observed with fix in this draft pull request: #22937

djaglowski · 2023-06-01T14:12:58Z

@kkujawa-sumo, thanks for the detailed example. I was able to reproduce the issue.

djaglowski · 2023-06-27T16:34:26Z

This issue was tentatively resolved by #23183, but unfortunately that PR had to be reverted because it introduced some minor issue with rotation. I can't make it an immediate priority to reboot the PR but it is on my list to circle back to when possible.

kasia-kujawa · 2023-07-07T10:29:03Z

Maybe it is worthy to make buffer size configurable 🤔 There will be a workaround for people who need higher fingerprint size and have this issue.
What do you think?

djaglowski · 2023-07-17T14:47:17Z

Maybe it is worthy to make buffer size configurable 🤔 There will be a workaround for people who need higher fingerprint size and have this issue. What do you think?

I'm hesitant to add this until we are sure it is necessary. My top priority right now it refactoring the fileconsumer package to make it more testable. Once we have solid test coverage on these nuanced issues, I think we can either solve this one or prove that it is necessary to make this a user facing option. That said, if you need a workaround urgently, I think it would be reasonable to add this behind a feature gate and we can remove it later if possible.

This prevents of open-telemetry/opentelemetry-collector-contrib#22936 which is caused by incorrect update of fingerprint when it is read in mutliple iterations. The issue is not observed when fingerprint_size is not higher than default buffer size (16kb).

This prevents of open-telemetry/opentelemetry-collector-contrib#22936 which is caused by incorrect update of fingerprint when it is read in multiple iterations. The issue is not observed when fingerprint_size is not higher than default buffer size (16kb).

… settings for fingerprint_size on k8s >=1.24 When fingerprint_size is set to 1kb (default value) the issue with duplicated logs is not observed ref: open-telemetry/opentelemetry-collector-contrib#22936

github-actions · 2023-09-18T03:30:07Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2023-09-19T16:59:47Z

Still on my radar. I'm on a mission to refactor fileconsumer into smaller, more testable packages. Much progress has been made but still a few steps away from circling back to this.

github-actions · 2023-11-20T03:30:15Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-01-29T03:30:16Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-02-02T19:26:20Z

Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Depends on #31298 Fixes #22936 This PR changes the way readers update their fingerprints. Currently, when `reader.ReadToEnd` is called, it creates a scanner and passes itself (the reader) in as the `io.Reader` so that a custom implementation of `Read` will be used by the scanner. Each time the scanner calls `Read`, we try to perform appropriate reasoning about whether the data we've read should be appended to the fingerprint. The problem is that the correct positioning of the bytes buffer is in some rare cases different than the file's "offset", as we track it. See example [here](#22937 (comment)). There appear to be two ways to solve this. A "simple" solution is to independently determine the file handle's current offset with a clever use of `Seek`, ([suggested here](https://stackoverflow.com/a/10901436/3511338). Although this does appear to work, it leaves open the possibility that the fingerprint is corrupted because _if the file was truncated_, we may be updating the fingerprint with incorrect information. The other solution, proposed in this PR, simply has the custom `Read` function set a flag to indicate that the fingerprint _should_ be updated. Then, just before returning from `ReadToEnd`, we create an entirely new fingerprint. This has the advantage of not having to manage any kind of append operations, but also allows the the opportunity to independently check that the fingerprint has not been altered by truncation. Benchmarks appear to show all three solutions are close in performance.

Depends on open-telemetry#31298 Fixes open-telemetry#22936 This PR changes the way readers update their fingerprints. Currently, when `reader.ReadToEnd` is called, it creates a scanner and passes itself (the reader) in as the `io.Reader` so that a custom implementation of `Read` will be used by the scanner. Each time the scanner calls `Read`, we try to perform appropriate reasoning about whether the data we've read should be appended to the fingerprint. The problem is that the correct positioning of the bytes buffer is in some rare cases different than the file's "offset", as we track it. See example [here](open-telemetry#22937 (comment)). There appear to be two ways to solve this. A "simple" solution is to independently determine the file handle's current offset with a clever use of `Seek`, ([suggested here](https://stackoverflow.com/a/10901436/3511338). Although this does appear to work, it leaves open the possibility that the fingerprint is corrupted because _if the file was truncated_, we may be updating the fingerprint with incorrect information. The other solution, proposed in this PR, simply has the custom `Read` function set a flag to indicate that the fingerprint _should_ be updated. Then, just before returning from `ReadToEnd`, we create an entirely new fingerprint. This has the advantage of not having to manage any kind of append operations, but also allows the the opportunity to independently check that the fingerprint has not been altered by truncation. Benchmarks appear to show all three solutions are close in performance.

kasia-kujawa added bug Something isn't working needs triage New item requiring triage labels May 30, 2023

github-actions bot added the receiver/filelog label May 30, 2023

kasia-kujawa mentioned this issue May 30, 2023

[fix] [pkg/stanza/fileconsumer] do not update fingerprint size when less data has been read #22937

Closed

djaglowski removed the needs triage New item requiring triage label Jun 1, 2023

kasia-kujawa mentioned this issue Jun 7, 2023

[pkg/stanza/fileconsumer] Fix issue where buffer size could cause incorrect fingerprint update #23183

Merged

kasia-kujawa mentioned this issue Jul 19, 2023

fix(otellogs): set fingerprint_size to 16kb to avoid of log duplication SumoLogic/sumologic-kubernetes-collection#3158

Closed

4 tasks

kasia-kujawa mentioned this issue Aug 3, 2023

fix(otellogs): fix configuration for filelog/container to use default settings for fingerprint_size on k8s >=1.24 SumoLogic/sumologic-kubernetes-collection#3185

Merged

4 tasks

github-actions bot added the Stale label Sep 18, 2023

djaglowski removed the Stale label Sep 19, 2023

github-actions bot added the Stale label Nov 20, 2023

djaglowski removed the Stale label Nov 28, 2023

github-actions bot added the Stale label Jan 29, 2024

djaglowski removed the Stale label Jan 29, 2024

djaglowski added priority:p2 Medium pkg/stanza labels Feb 2, 2024

djaglowski added the release:required-for-ga Must be resolved before GA release label Feb 12, 2024

djaglowski mentioned this issue Feb 16, 2024

[pkg/stanza] Simplify fingerprint updating #31251

Merged

djaglowski closed this as completed in #31251 Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated logs when fingerprint_size is set to larger value #22936

Duplicated logs when fingerprint_size is set to larger value #22936

kasia-kujawa commented May 30, 2023

github-actions bot commented May 30, 2023

kasia-kujawa commented May 30, 2023

djaglowski commented Jun 1, 2023

djaglowski commented Jun 27, 2023

kasia-kujawa commented Jul 7, 2023 •

edited

Loading

djaglowski commented Jul 17, 2023

github-actions bot commented Sep 18, 2023

djaglowski commented Sep 19, 2023

github-actions bot commented Nov 20, 2023

github-actions bot commented Jan 29, 2024

github-actions bot commented Feb 2, 2024

Duplicated logs when fingerprint_size is set to larger value #22936

Duplicated logs when fingerprint_size is set to larger value #22936

Comments

kasia-kujawa commented May 30, 2023

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented May 30, 2023

kasia-kujawa commented May 30, 2023

djaglowski commented Jun 1, 2023

djaglowski commented Jun 27, 2023

kasia-kujawa commented Jul 7, 2023 • edited Loading

djaglowski commented Jul 17, 2023

github-actions bot commented Sep 18, 2023

djaglowski commented Sep 19, 2023

github-actions bot commented Nov 20, 2023

github-actions bot commented Jan 29, 2024

github-actions bot commented Feb 2, 2024

kasia-kujawa commented Jul 7, 2023 •

edited

Loading