Skip to content

ffi: Add support for auto-generated keys in key-value pair IR format. #556

@LinZhihao-723

Description

@LinZhihao-723

Request

In the current key-value pair IR format, we only have one type of key-value pairs. As we planned to extend the current format, we decided to split input key-value pairs into two categories:

  • Auto-generated key-value pairs: added by logging libraries, served as metadata of the log event, i.e., the timestamp of the log event.
  • User-generated key-value pairs: user data specified in their logging statement.

This requires the underlying serialization/deserialization to maintain two key namespaces to differentiate auto-generated keys from user-generated keys. The reason is that the same key may exist in both pairs. For example, they both can have a key named “timestamp.” These two namespaces will be implemented as two individual schema trees inside the serializer/deserializer.

To fully support this feature, we also need to update the serialization/deserialization APIs to receive/return user-generated kv pairs and auto-generated kv pairs as different msgpack objects.

Possible implementation

The tricky part is how we serialize schema tree node IDs. The stream maintains two schema trees: one for the auto-generated keys, and one for the user-generated keys. When encoding a schema tree node ID, we don’t want to create two sets of header bytes for two trees because:

  • We want to reuse serialization/deserialization logic as much as possible to reduce code duplication;
  • The implementation of two trees is the same, we just need a way to differentiate which tree the node ID refers to.

Therefore, we used signed encoded node IDs to differentiate two schema trees. The convention we use is the following:

  • If the encoded ID i has a non-negative value (>= 0), it belongs to the user-generated key schema tree, and the actual node ID in the tree is i.
  • If the encoded ID i has a negative value (< 0), this ID belongs to the auto-generated-key schema tree, and the actual node ID in the tree is ~i, where ~ is the complement operator. This is essentially called one's complement
    • We do not take the absolute value |i| of the negative encoded value because we might need to refer to the root, which has a numerical ID 0, before encoding. One's complement allows us to refer to 0 using hex value 0xFFFF

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions