Bug
The KV-pair IR serializer delegates to the text IR CLP-encoding functions when serializing string values that contain spaces. These text IR functions enforce an INT32_MAX (~2 GiB) limit on logtype and dictionary variable strings. Since virtually all log messages contain spaces, this limit effectively applies to all log event message strings serialized into KV-pair IR.
How the limit propagates
-
serialize_value_string() checks if the string contains a space:
- No space → calls
serialize_string(), which supports up to UINT32_MAX (~4 GiB) — not affected.
- Has space → calls
serialize_clp_string() which calls four_byte_encoding::serialize_message() or eight_byte_encoding::serialize_message().
-
serialize_logtype() fails if the logtype exceeds INT32_MAX:
} else if (length <= INT32_MAX) {
ir_buf.push_back(cProtocol::Payload::LogtypeStrLenInt);
serialize_int(static_cast<int32_t>(length), ir_buf);
} else {
// Logtype is too long for encoding
return false;
}
-
DictionaryVariableHandler::operator() similarly fails if a single dictionary variable string exceeds INT32_MAX.
The KV-pair IR protocol constants (protocol_constants.hpp) define LogtypeStrLenInt (0x23) using a signed 32-bit integer for the length field, which is the root cause of the ~2 GiB ceiling. In contrast, the KV-pair IR native string path uses StrLenUInt (0x43) with an unsigned 32-bit integer (protocol_constants.hpp:61), supporting ~4 GiB.
Additionally, the IR stream preamble metadata (serialize_metadata()) is capped at UINT16_MAX (65,535 bytes), shared by both text IR and KV-pair IR.
Size limits
Practical impact today
These limits are not the binding constraint today. The log-converter has a 64 MiB buffer limit per log event (LogConverter.hpp:41) which is hit first, and JSON ingestion defaults to 512 MiB per record (--max-document-size). The ~2 GiB IR limit would become the bottleneck only if those upstream limits are raised.
Future impact on log-viewer
Once #2174 (MongoDB 16 MiB BSON limit for search results) is resolved and large log events can be retrieved through the WebUI, the log-viewer's extraction path could also be affected. Currently:
- The
clp_s log-viewer extracts ordered JSON chunks via JsonConstructor (clp-s x --ordered), which does not go through the KV-pair IR serializer. However, if this extraction path is ever changed to use KV-pair IR, the same limits would apply.
- The
clp engine log-viewer extracts text IR streams via clo i, but the text IR logtype/variable limits were already enforced during ingestion, so extraction would not introduce new failures.
CLP version
3b4d13f
Environment
Any environment that serializes string values into KV-pair IR. Today this is the log-converter during unstructured text ingestion in the CLP-JSON package, though the 64 MiB LogConverter buffer limit is hit first.
Reproduction steps
- Bypass the
LogConverter's 64 MiB buffer by calling the KV-pair IR Serializer directly (e.g., in a unit test) with a string value larger than INT32_MAX (~2 GiB) that includes at least one space.
- Observe that
serialize_logtype() returns false, causing the serialization to fail.
Bug
The KV-pair IR serializer delegates to the text IR CLP-encoding functions when serializing string values that contain spaces. These text IR functions enforce an
INT32_MAX(~2 GiB) limit on logtype and dictionary variable strings. Since virtually all log messages contain spaces, this limit effectively applies to all log event message strings serialized into KV-pair IR.How the limit propagates
serialize_value_string()checks if the string contains a space:serialize_string(), which supports up toUINT32_MAX(~4 GiB) — not affected.serialize_clp_string()which callsfour_byte_encoding::serialize_message()oreight_byte_encoding::serialize_message().serialize_logtype()fails if the logtype exceedsINT32_MAX:DictionaryVariableHandler::operator()similarly fails if a single dictionary variable string exceedsINT32_MAX.The KV-pair IR protocol constants (
protocol_constants.hpp) defineLogtypeStrLenInt(0x23) using a signed 32-bit integer for the length field, which is the root cause of the ~2 GiB ceiling. In contrast, the KV-pair IR native string path usesStrLenUInt(0x43) with an unsigned 32-bit integer (protocol_constants.hpp:61), supporting ~4 GiB.Additionally, the IR stream preamble metadata (
serialize_metadata()) is capped atUINT16_MAX(65,535 bytes), shared by both text IR and KV-pair IR.Size limits
INT32_MAX)encoding_methods.cpp:84-89INT32_MAX)encoding_methods.cpp:61-65UINT32_MAX)utils.cpp:45-48UINT16_MAX)utils.cpp:25-30Practical impact today
These limits are not the binding constraint today. The
log-converterhas a 64 MiB buffer limit per log event (LogConverter.hpp:41) which is hit first, and JSON ingestion defaults to 512 MiB per record (--max-document-size). The ~2 GiB IR limit would become the bottleneck only if those upstream limits are raised.Future impact on log-viewer
Once #2174 (MongoDB 16 MiB BSON limit for search results) is resolved and large log events can be retrieved through the WebUI, the log-viewer's extraction path could also be affected. Currently:
clp_slog-viewer extracts ordered JSON chunks viaJsonConstructor(clp-s x --ordered), which does not go through the KV-pair IR serializer. However, if this extraction path is ever changed to use KV-pair IR, the same limits would apply.clpengine log-viewer extracts text IR streams viaclo i, but the text IR logtype/variable limits were already enforced during ingestion, so extraction would not introduce new failures.CLP version
3b4d13f
Environment
Any environment that serializes string values into KV-pair IR. Today this is the
log-converterduring unstructured text ingestion in the CLP-JSON package, though the 64 MiB LogConverter buffer limit is hit first.Reproduction steps
LogConverter's 64 MiB buffer by calling the KV-pair IRSerializerdirectly (e.g., in a unit test) with a string value larger thanINT32_MAX(~2 GiB) that includes at least one space.serialize_logtype()returnsfalse, causing the serialization to fail.