Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal Amendment: UTF-8 migration #30
Proposal Amendment: UTF-8 migration #30
Changes from 8 commits
dbf898e
d05fb21
32a0a92
3839413
a25ad06
808c7ff
b7480de
31d334f
c6b62e5
902040a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it includes the case where an old Prometheus has ingested from new producers (and names might include escaped names).
Or is this referring to old Prometheus and old producers, so even escaping in names can be ruled out?
Could you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After having read through all the below, I would say what's meant here is "if at all, we will have escaped names, so we would never query for UTF-8 names, but only try out the specified escaping schemas".
Still, I think the difference to the mixed data case should be made clearer.
In principle, there might be a scenario where the user knows for sure that they won't have any escaping at all, for example if they had a pure Prometheus stack so far (no OTel etc.), but they would like to use UTF-8 names, so they still have a period of mixed versions deployed (new and old producers, new and old ingesters). I guess it's fine to not implement a specific optimization for that use case (which would be that we don't need a broad search for the old blocks at all), but it would be good if that case is described and that we disregard it as an informed decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how this is useful. What query will do if it sees old block? When UTF-8 is queried? I think we need to still do this "migration broad search" for those, no? As we don't know what escape method clients used.
Is it as an optimization to immediately say no result on those blocks for non escaped UTF-8 lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I suppose the only real benefit is removing the UTF-8 query from the list of possibilities. The other escapings may all be possible. Do you think that maybe we don't need a version number bump at all and can just use the flag/date-based logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I am fine with both.
It feels like small change to bump version and might unlock some optimizations, so I am fine keeping this up, just wanted to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yeah, I guess the only optimization is that we never have to run the original UTF-8 query on blocks with an older version. Not sure if that is worth the effort of versioning the blocks. But maybe, as @bwplotka said, the effort is actually quite low, and we might see additional uses of the version number later.