Exemplar label length limit: consider bytes rather than UTF-8 characters

I'm sure I commented on this at some point while reviewing the spec. But I never filed an issue here because it seemed like one of the nittier-grittier aspects. Looking at discussions like [this one](https://github.com/prometheus/prometheus/pull/8781#pullrequestreview-651165580), however, I get the feeling that the practical impact might be severe enough to reconsider the current wording is the spec:

"The combined length of the label names and values of an Exemplar's LabelSet MUST NOT exceed 128 UTF-8 characters."

The problem is that a UTF-8 character has variably byte length, from one to four. So this limits the max byte count needed for the label names and values to 512 bytes. Therefore, it is easy to be permissive and just allow a total length of 512 bytes. However, if a permissive system (e.g. Prometheus) ingests a non-compliant exemplar and then wants to propagate it (e.g. via remote-write) to a less permissive system, it will result in a failure to propagate the exemplar. Therefore, it's generally better to have a strict check, but the strict check requires scanning the label names and values to find out how many UTF-8 characters are contained, which is a performance problem in high-ingestion systems.

IIUC the intention of the length limit is to limit the space exemplars will take. My understanding is that limiting the byte length of the UTF-8 strings rather than the character count is better in every relevant aspect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exemplar label length limit: consider bytes rather than UTF-8 characters #198

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exemplar label length limit: consider bytes rather than UTF-8 characters #198

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions