-
Notifications
You must be signed in to change notification settings - Fork 83
Description
This is the implementation plan for #747. The prototype implementation is in this branch.
Overview
As an extension of #747, this issue discusses the implementation plan and tracks the implementation progress. To support searching kv-pair IR stream, we will break the required features into a PR series with the following steps:
- Relevant KQL AST Utilities to make clp-s KQL query compatible with kv-pair IR stream.
- Projection Handler Interface to define the projection handling interface.
- Query Handler to handle the core search logic during deserialization.
- Deserializer Integration to integrate the features above into the current IR deserializer.
- clp-s Integration to integrate the overall search feature into
clp-s
' cli.
Relevant KQL AST Utilities
The current KQL AST is designed to operate on clp-s archives, which have a different schema tree implementation comparing to kv-pair IR stream.
To adapt these differences, we need some utility code to:
- Convert IR's schema tree type to
clp_s::search::ast::LiteralTypeBitmask
- Convert IR's node-type-value-pair to
clp_s::search::ast::LiteralType
- Evaluate a KQL filter expression against the deserialized IR value.
Projection Handler Interface
As discussed in #747, users need to define their own projection resolution handler to maintain the full-key-to-node-ID mapping across the stream.
This mapping should be applied by user-level code, which is outside the deserializer.
The Projection Handler is a concept that defines the interface for users to implement their own logic to handle projection resolution.
Query Handler
Query handler is an object that:
- Hold all the necessary data structures to proceed streaming IR search/projection.
- Be responsible for evaluating the query on the deserialized node-id-value-pairs.
As designed to be a part of the deserializer, the following APIs are needed:
- column_resolution_update: Handle column resolution and update the relevant key-to-node-ID mapping required by the query or the projection (possibly calling the projection handler). Will be called whenever schema tree node insertion IR unit is deserialized.
- evaluate_node_id_value_pairs: Execute the query on the given node-ID-value-pairs. Will be called whenever the log event IR unit is deserialized.
This object implements the core search logic so most of the engineering efforts will be spent here.
Deserializer Integration
The components discussed above will eventually be integrated into the current IR deserializer. As discussed above, if a query is given, the deserializer will behave as following to handle the query:
- For schema tree node insertion IR unit, the deserializer will call the relevant query handler's API to update the column resolution.
- For log event IR unit, the deserializer will call the relevant query handler's to evaluate the query, and only call user-defined log event handler if the query evaluates to true.
- If the query evaluates to false, a dedicated error code will be returned to indicate that a log event has been successfully deserialized but failed to match the query.
To avoid potential overhead on deserialization without a query, we can use templates to statically determine whether a query handler branch should be involved.
clp-s Integration
This is the final step in this PR series, which integrates the above components into the clp-s
and exposes the basic search features through the command line.
Dependency Graph
flowchart LR
utilities["Relevant KQL AST Utilities"]
projection_handler["Projection Handler Interface"]
query_handler_interface["Query Handler Interface"]
query_handler_implementation["Query Handler Implementation"]
query_handler_complete["Query Handler Complete"]
deserializer_integration["Deserializer Integration"]
clp_s_integration["clp-s Integration"]
utilities --> query_handler_implementation
projection_handler --> query_handler_interface
query_handler_implementation --> query_handler_complete
query_handler_interface --> query_handler_complete
query_handler_interface --> deserializer_integration
deserializer_integration --> clp_s_integration
PRs should be scheduled according to this flowchart.