-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated logs when fingerprint_size is set to larger value #22936
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
The issue is not observed with fix in this draft pull request: #22937 |
@kkujawa-sumo, thanks for the detailed example. I was able to reproduce the issue. |
This issue was tentatively resolved by #23183, but unfortunately that PR had to be reverted because it introduced some minor issue with rotation. I can't make it an immediate priority to reboot the PR but it is on my list to circle back to when possible. |
Maybe it is worthy to make buffer size configurable 🤔 There will be a workaround for people who need higher fingerprint size and have this issue. |
I'm hesitant to add this until we are sure it is necessary. My top priority right now it refactoring the |
This prevents of open-telemetry/opentelemetry-collector-contrib#22936 which is caused by incorrect update of fingerprint when it is read in mutliple iterations. The issue is not observed when fingerprint_size is not higher than default buffer size (16kb).
This prevents of open-telemetry/opentelemetry-collector-contrib#22936 which is caused by incorrect update of fingerprint when it is read in mutliple iterations. The issue is not observed when fingerprint_size is not higher than default buffer size (16kb).
This prevents of open-telemetry/opentelemetry-collector-contrib#22936 which is caused by incorrect update of fingerprint when it is read in multiple iterations. The issue is not observed when fingerprint_size is not higher than default buffer size (16kb).
This prevents of open-telemetry/opentelemetry-collector-contrib#22936 which is caused by incorrect update of fingerprint when it is read in multiple iterations. The issue is not observed when fingerprint_size is not higher than default buffer size (16kb).
… settings for fingerprint_size on k8s >=1.24 When fingerprint_size is set to 1kb (default value) the issue with duplicated logs is not observed ref: open-telemetry/opentelemetry-collector-contrib#22936
… settings for fingerprint_size on k8s >=1.24 When fingerprint_size is set to 1kb (default value) the issue with duplicated logs is not observed ref: open-telemetry/opentelemetry-collector-contrib#22936
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Still on my radar. I'm on a mission to refactor |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Depends on #31298 Fixes #22936 This PR changes the way readers update their fingerprints. Currently, when `reader.ReadToEnd` is called, it creates a scanner and passes itself (the reader) in as the `io.Reader` so that a custom implementation of `Read` will be used by the scanner. Each time the scanner calls `Read`, we try to perform appropriate reasoning about whether the data we've read should be appended to the fingerprint. The problem is that the correct positioning of the bytes buffer is in some rare cases different than the file's "offset", as we track it. See example [here](#22937 (comment)). There appear to be two ways to solve this. A "simple" solution is to independently determine the file handle's current offset with a clever use of `Seek`, ([suggested here](https://stackoverflow.com/a/10901436/3511338). Although this does appear to work, it leaves open the possibility that the fingerprint is corrupted because _if the file was truncated_, we may be updating the fingerprint with incorrect information. The other solution, proposed in this PR, simply has the custom `Read` function set a flag to indicate that the fingerprint _should_ be updated. Then, just before returning from `ReadToEnd`, we create an entirely new fingerprint. This has the advantage of not having to manage any kind of append operations, but also allows the the opportunity to independently check that the fingerprint has not been altered by truncation. Benchmarks appear to show all three solutions are close in performance.
Depends on open-telemetry#31298 Fixes open-telemetry#22936 This PR changes the way readers update their fingerprints. Currently, when `reader.ReadToEnd` is called, it creates a scanner and passes itself (the reader) in as the `io.Reader` so that a custom implementation of `Read` will be used by the scanner. Each time the scanner calls `Read`, we try to perform appropriate reasoning about whether the data we've read should be appended to the fingerprint. The problem is that the correct positioning of the bytes buffer is in some rare cases different than the file's "offset", as we track it. See example [here](open-telemetry#22937 (comment)). There appear to be two ways to solve this. A "simple" solution is to independently determine the file handle's current offset with a clever use of `Seek`, ([suggested here](https://stackoverflow.com/a/10901436/3511338). Although this does appear to work, it leaves open the possibility that the fingerprint is corrupted because _if the file was truncated_, we may be updating the fingerprint with incorrect information. The other solution, proposed in this PR, simply has the custom `Read` function set a flag to indicate that the fingerprint _should_ be updated. Then, just before returning from `ReadToEnd`, we create an entirely new fingerprint. This has the advantage of not having to manage any kind of append operations, but also allows the the opportunity to independently check that the fingerprint has not been altered by truncation. Benchmarks appear to show all three solutions are close in performance.
Depends on open-telemetry#31298 Fixes open-telemetry#22936 This PR changes the way readers update their fingerprints. Currently, when `reader.ReadToEnd` is called, it creates a scanner and passes itself (the reader) in as the `io.Reader` so that a custom implementation of `Read` will be used by the scanner. Each time the scanner calls `Read`, we try to perform appropriate reasoning about whether the data we've read should be appended to the fingerprint. The problem is that the correct positioning of the bytes buffer is in some rare cases different than the file's "offset", as we track it. See example [here](open-telemetry#22937 (comment)). There appear to be two ways to solve this. A "simple" solution is to independently determine the file handle's current offset with a clever use of `Seek`, ([suggested here](https://stackoverflow.com/a/10901436/3511338). Although this does appear to work, it leaves open the possibility that the fingerprint is corrupted because _if the file was truncated_, we may be updating the fingerprint with incorrect information. The other solution, proposed in this PR, simply has the custom `Read` function set a flag to indicate that the fingerprint _should_ be updated. Then, just before returning from `ReadToEnd`, we create an entirely new fingerprint. This has the advantage of not having to manage any kind of append operations, but also allows the the opportunity to independently check that the fingerprint has not been altered by truncation. Benchmarks appear to show all three solutions are close in performance.
Component(s)
receiver/filelog
What happened?
Description
Filelog receiver constantly reads the same log file when
fingerprint_size
is set to value larger than size of scanner default buffer and size of the log file is lower thanfingerprint_size
but larger than scanner buffer.Steps to Reproduce
Expected Result
Logs are not duplicated
Actual Result
Logs are duplicated, the same logs are constantly read
Collector version
commit:
ca50a98fdda4c362d8782a29f6fc5cc27977f37b
Environment information
Environment
Darwin Kernel Version 22.3.0
go version go1.19.6 darwin/amd64
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: