You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following truncation logic can produce invalid UTF-8. OTLP requires valid UTF-8. We have seen this issue in otel-go as well: open-telemetry/opentelemetry-go#3021
/**
* Apply the {@code lengthLimit} to the attribute {@code value}. Strings and strings in lists
* which exceed the length limit are truncated.
*/
public static Object applyAttributeLengthLimit(Object value, int lengthLimit) {
if (lengthLimit == Integer.MAX_VALUE) {
return value;
}
if (value instanceof List) {
List<?> values = (List<?>) value;
List<Object> response = new ArrayList<>(values.size());
for (Object entry : values) {
response.add(applyAttributeLengthLimit(entry, lengthLimit));
}
return response;
}
if (value instanceof String) {
String str = (String) value;
return str.length() < lengthLimit ? value : str.substring(0, lengthLimit);
}
return value;
}
Steps to reproduce
Use an attribute with multi-byte characters, apply an attribute size limit so that truncation occurs. Now export using a protobuf library that validates UTF-8 for its string fields (which not all libraries do). The Golang protobuf library does UTF-8 validation, which makes it impossible for the OTel collector to receive OTLP data with invalid UTF-8. Where the SDK has control over this matter, the SDK should avoid creating invalid UTF-8.
What did you expect to see?
UTF-8-aware truncation logic.
What did you see instead?
The code snippet above.
What version and what artifacts are you using?
This is a hypothetical bug report based on reviewing source.
Java strings and characters are inherently multibyte, and substring is aware of that (it works on characters, not bytes). I don't believe this is an actual bug.
Describe the bug
The following truncation logic can produce invalid UTF-8. OTLP requires valid UTF-8. We have seen this issue in otel-go as well: open-telemetry/opentelemetry-go#3021
Steps to reproduce
Use an attribute with multi-byte characters, apply an attribute size limit so that truncation occurs. Now export using a protobuf library that validates UTF-8 for its
string
fields (which not all libraries do). The Golang protobuf library does UTF-8 validation, which makes it impossible for the OTel collector to receive OTLP data with invalid UTF-8. Where the SDK has control over this matter, the SDK should avoid creating invalid UTF-8.What did you expect to see?
UTF-8-aware truncation logic.
What did you see instead?
The code snippet above.
What version and what artifacts are you using?
This is a hypothetical bug report based on reviewing source.
Additional context
See the bigger question: open-telemetry/opentelemetry-specification#3421, open-telemetry/opentelemetry-specification#504
The text was updated successfully, but these errors were encountered: