-
Notifications
You must be signed in to change notification settings - Fork 81
Description
Request
In the current key-value pair IR format, we only have one type of key-value pairs. As we planned to extend the current format, we decided to split input key-value pairs into two categories:
- Auto-generated key-value pairs: added by logging libraries, served as metadata of the log event, i.e., the timestamp of the log event.
- User-generated key-value pairs: user data specified in their logging statement.
This requires the underlying serialization/deserialization to maintain two key namespaces to differentiate auto-generated keys from user-generated keys. The reason is that the same key may exist in both pairs. For example, they both can have a key named “timestamp.” These two namespaces will be implemented as two individual schema trees inside the serializer/deserializer.
To fully support this feature, we also need to update the serialization/deserialization APIs to receive/return user-generated kv pairs and auto-generated kv pairs as different msgpack objects.
Possible implementation
The tricky part is how we serialize schema tree node IDs. The stream maintains two schema trees: one for the auto-generated keys, and one for the user-generated keys. When encoding a schema tree node ID, we don’t want to create two sets of header bytes for two trees because:
- We want to reuse serialization/deserialization logic as much as possible to reduce code duplication;
- The implementation of two trees is the same, we just need a way to differentiate which tree the node ID refers to.
Therefore, we used signed encoded node IDs to differentiate two schema trees. The convention we use is the following:
- If the encoded ID i has a non-negative value (>= 0), it belongs to the user-generated key schema tree, and the actual node ID in the tree is i.
- If the encoded ID i has a negative value (< 0), this ID belongs to the auto-generated-key schema tree, and the actual node ID in the tree is ~i, where ~ is the complement operator. This is essentially called one's complement
- We do not take the absolute value |i| of the negative encoded value because we might need to refer to the root, which has a numerical ID 0, before encoding. One's complement allows us to refer to 0 using hex value
0xFFFF
- We do not take the absolute value |i| of the negative encoded value because we might need to refer to the root, which has a numerical ID 0, before encoding. One's complement allows us to refer to 0 using hex value