Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(promtail): Handle docker logs when a log is split in multiple frames #12374

Merged
merged 9 commits into from
Apr 26, 2024

Conversation

jonaslb
Copy link
Contributor

@jonaslb jonaslb commented Mar 27, 2024

What this PR does / why we need it:

Docker splits log lines into 16 kiB frames if a line is longer than that.
When requesting timestamps, every frame is prefixed with the timestamp.
Previously, Promtail would concatenate the frames naively, and only strip the timestamp at the start of a line, and not at the start of every frame.
The result is that logs which were larger than 16 kiB would get corrupted, especially an issue in the case of structured logs that would be parsed downstream somewhere.
This PR changes the docker logs handling to be based around frames in channels instead of just writing into an io.Write.

Which issue(s) this PR fixes:

Special notes for your reviewer:

  • The code which is modified from github.com/docker/docker/pkg/stdcopy has been put into pkg/framedstdcopy and marked as apache2. This is because the bug is (arguably) also present in e.g. the docker CLI, and possibly also in many other places, and I think it is most appropriate then to keep the license for that the same as stdcopy already is.
  • I think the new approach of sending slices into a channel is probably more allocation-heavy than the previous approach (with io.Write). I don't think it's an issue, but maybe there are strong opinions about it?
  • I put a soft limit on the buffering size when combining frames of 16Mi. Maybe there are opinions about that too? Changed to follow max_line_size setting or a max of 256k following review comments

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@jonaslb jonaslb requested a review from a team as a code owner March 27, 2024 11:18
@CLAassistant
Copy link

CLAassistant commented Mar 27, 2024

CLA assistant check
All committers have signed the CLA.

@jonaslb jonaslb changed the title Promtail: Fix handling of docker logs when a line is split in multiple frames fix(promtail): Handle docker logs when a log is split in multiple frames Apr 3, 2024
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do a more in depth review later, but I think it would also be a good idea to include copies (modified where needed) of the tests from the docker codes stdcopy_test.go

CHANGELOG.md Outdated Show resolved Hide resolved
clients/pkg/promtail/targets/docker/target.go Outdated Show resolved Hide resolved
clients/pkg/promtail/targets/docker/target.go Outdated Show resolved Hide resolved
@jonaslb
Copy link
Contributor Author

jonaslb commented Apr 9, 2024

Thanks for the comments - I have responded to those review comments and will make adjustments (separate tests corresponding to stdcopy_test, removal of changelog entry) sometime this week.

@jonaslb
Copy link
Contributor Author

jonaslb commented Apr 10, 2024

Requested tests were added and the manually added changelog entry was removed.

@jonaslb jonaslb requested a review from cstyan April 19, 2024 13:05
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good progress, I think we can merge soon

clients/pkg/promtail/targets/docker/target.go Outdated Show resolved Hide resolved
clients/pkg/promtail/targets/docker/target.go Outdated Show resolved Hide resolved
clients/pkg/promtail/targets/docker/target.go Outdated Show resolved Hide resolved
Comment on lines 213 to 215
// We don't output trailing newlines
payload = strings.TrimSuffix(payload, "\n")
payload = strings.TrimSuffix(payload, "\r")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be removing these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this into the process function which now has a TrimRight call for these. The reason is that process keeps track of the length of the buffer, to see if it's over the new limit we set, but this length should exclude the trailing newline (because we don't output the newline). And I thought it was just easier to take away the trailing newline earlier in the process for this purpose (otherwise we'd have to keep track of the buffer length in a more complicated manner than just calling payloadAcc.Len()).

@cstyan
Copy link
Contributor

cstyan commented Apr 22, 2024

@jonaslb as a promtail user, do you have any thoughts on the best place to document the behaviour between max_line_size and the docker target? Internally we haven't agreed on exactly what to do here. IMO we probably need a note in both sections here and here noting that the docker SD code respects the max_line_size config prior to other portion of promtail like pipeline stages.

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Apr 23, 2024
@jonaslb
Copy link
Contributor Author

jonaslb commented Apr 23, 2024

Yeah, now that I've integrated max_line_size support, I agree that we need to document it.

I don't think it needs to be explicitly said that the target also respects the limit, because the data flowing from a target to the client (where lines are truncated/discarded according to the setting) is not user observable anyway. So the main thing to document, I think, is the default limit due to memory and then noting that it can be changed with max_line_size.

I have pushed a commit with a proposal for how it could be documented.

@cstyan
Copy link
Contributor

cstyan commented Apr 23, 2024

Just one more minor change to the docs and then we can merge imo

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great 👍 thanks for your patience @jonaslb

@cstyan cstyan merged commit c0113db into grafana:main Apr 26, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistently corrupt log data for long (>16kB) log lines Reconstruct split docker log lines in Promtail
3 participants