Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix bug in query result marshaling for invalid utf8 characters #14585

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Oct 23, 2024

Potentially we want to fix this by actually sanitizing data for OTLP ingestion/structured metadata, but I think we should include a fix to allow users with data that is already stored in Loki to query it out.

The bug is present for streams which have structured metadata which contains invalid UTF-8 characters. When you write a query like {label="x"} there's no problem, but as soon as you do label="x"} |= "some filter" the bug kicks in because we pull in the structured metadata.

When we then go to marshal the labelset on the query result we run into an error here because the underlying call to NewLabelSet calls Prometheus' ParseMetric which enforces "any valid Unicode character" in the label value. This fails if there's an invalid character. I'm not entirely sure what the invalid character is or where it's coming from, but we have precedent for checking for it already as RuneError (here) in other places. For this PR I've simply copied what we do for the JSONParser from here.

…d utf8 characters are present

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan cstyan requested a review from a team as a code owner October 23, 2024 00:14
@cstyan cstyan changed the title [bug] fix bug in query result marshaling for invalid utf8 characters fix: fix bug in query result marshaling for invalid utf8 characters Oct 23, 2024
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving with small improvements to apply

}
return r
}
stream.Labels = string(bytes.Map(removeInvalidUtf, []byte(stream.Labels)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to use bytes.Map() only if []byte(stream.Labels) contains utf8.RuneError .
more details here

@@ -77,6 +79,14 @@ func NewStreams(s logqlmodel.Streams) (loghttp.Streams, error) {
ret := make([]loghttp.Stream, len(s))

for i, stream := range s {
// The rune error replacement is rejected by Prometheus hence replacing them with space.
removeInvalidUtf := func(r rune) rune {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be moved to global vars to initialize once

Labels: "{asdf=\"�\"}",
},
})
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful to assert the result as well

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants