Skip to content

Exemplar label length limit: consider bytes rather than UTF-8 characters #198

@beorn7

Description

@beorn7

I'm sure I commented on this at some point while reviewing the spec. But I never filed an issue here because it seemed like one of the nittier-grittier aspects. Looking at discussions like this one, however, I get the feeling that the practical impact might be severe enough to reconsider the current wording is the spec:

"The combined length of the label names and values of an Exemplar's LabelSet MUST NOT exceed 128 UTF-8 characters."

The problem is that a UTF-8 character has variably byte length, from one to four. So this limits the max byte count needed for the label names and values to 512 bytes. Therefore, it is easy to be permissive and just allow a total length of 512 bytes. However, if a permissive system (e.g. Prometheus) ingests a non-compliant exemplar and then wants to propagate it (e.g. via remote-write) to a less permissive system, it will result in a failure to propagate the exemplar. Therefore, it's generally better to have a strict check, but the strict check requires scanning the label names and values to find out how many UTF-8 characters are contained, which is a performance problem in high-ingestion systems.

IIUC the intention of the length limit is to limit the space exemplars will take. My understanding is that limiting the byte length of the UTF-8 strings rather than the character count is better in every relevant aspect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions