-
Notifications
You must be signed in to change notification settings - Fork 82
feat(clp-s): Add support for searching against MPT subtrees with specific types. #908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change unifies the management of subtree IDs in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant QueryRunner
participant SchemaMatch
participant SchemaTree
participant ColumnDescriptor
User->>QueryRunner: Submit search query
QueryRunner->>SchemaMatch: Request column mapping
SchemaMatch->>ColumnDescriptor: Check for subtree_type
alt subtree_type is set
SchemaMatch->>SchemaTree: get_subtree_for_namespace_and_type(namespace, subtree_type)
SchemaTree-->>SchemaMatch: Return subtree node ID
SchemaMatch->>SchemaTree: Access children for mapping
else subtree_type not set
SchemaMatch->>SchemaTree: get_subtrees()
SchemaTree-->>SchemaMatch: Return all subtrees
SchemaMatch->>SchemaTree: Iterate non-metadata subtrees
end
SchemaMatch-->>QueryRunner: Return column mapping
QueryRunner->>QueryRunner: Evaluate filters (skip metadata columns unless targeted)
QueryRunner-->>User: Return results
Suggested reviewers
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🧰 Additional context used📓 Path-based instructions (1)`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
🧬 Code Graph Analysis (1)components/core/src/clp_s/search/SchemaMatch.cpp (4)
⏰ Context from checks skipped due to timeout of 90000ms (11)
🔇 Additional comments (12)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
components/core/src/clp_s/SchemaTree.cpp
(2 hunks)components/core/src/clp_s/SchemaTree.hpp
(5 hunks)components/core/src/clp_s/archive_constants.hpp
(1 hunks)components/core/src/clp_s/search/QueryRunner.cpp
(4 hunks)components/core/src/clp_s/search/SchemaMatch.cpp
(4 hunks)components/core/src/clp_s/search/ast/ColumnDescriptor.cpp
(2 hunks)components/core/src/clp_s/search/ast/ColumnDescriptor.hpp
(3 hunks)components/core/tests/test-clp_s-search.cpp
(5 hunks)components/core/tests/test-kql.cpp
(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
components/core/src/clp_s/archive_constants.hpp
components/core/tests/test-kql.cpp
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp
components/core/src/clp_s/search/QueryRunner.cpp
components/core/tests/test-clp_s-search.cpp
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp
components/core/src/clp_s/SchemaTree.cpp
components/core/src/clp_s/search/SchemaMatch.cpp
components/core/src/clp_s/SchemaTree.hpp
🧬 Code Graph Analysis (3)
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (1)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
m_subtree_type
(285-285)m_namespace
(279-279)
components/core/src/clp_s/SchemaTree.cpp (1)
components/core/src/clp_s/SchemaTree.hpp (8)
m_namespace_and_type_to_subtree_id
(169-172)get_subtree_for_namespace_and_type
(178-180)subtree_namespace
(147-149)subtree_namespace
(147-147)subtree_namespace
(166-167)field_name
(157-157)m_nodes
(182-182)m_nodes
(195-199)
components/core/src/clp_s/search/SchemaMatch.cpp (3)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
subtree_type
(287-289)subtree_type
(287-287)components/core/src/clp_s/search/SchemaMatch.hpp (4)
column
(94-94)column
(101-101)column
(154-154)column
(161-161)components/core/src/clp_s/Schema.hpp (2)
node_type
(174-176)node_type
(174-174)
🪛 Cppcheck (2.10-2)
components/core/src/clp_s/SchemaTree.cpp
[style] 75-75: The function 'get_metadata_field_id' is never used.
(unusedFunction)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (25)
components/core/src/clp_s/archive_constants.hpp (1)
37-38
: Constants added to support the new subtree type feature.The new constants
cObjectSubtreeType
andcMetadataSubtreeType
align well with the PR objective to enable searching against specific MPT subtrees. These string constants define the supported values for the newsubtree_type
property inColumnDescriptor
.components/core/tests/test-kql.cpp (1)
54-54
: Test assertions verify default behavior for subtree types.The new assertions ensure that KQL parsing doesn't set
subtree_type
by default. This is important to maintain backward compatibility and confirm that columns will resolve against non-metadata subtrees by default, as specified in the PR objectives.Also applies to: 73-73, 93-93, 143-143, 195-195, 225-225, 281-281, 301-301, 325-325
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (2)
86-89
: Updated print method to include subtree type information.The changes to the
print()
method properly include the newsubtree_type
property in the output when it's set, which is helpful for debugging and logging.
98-98
: Updated closing delimiter to match new token list format.The closing delimiter has been updated to
])
to properly match the new tokens list format using square brackets.components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (3)
5-5
: Added necessary includes for new functionality.Added
<optional>
and<utility>
headers to support the newstd::optional
member andstd::move
usage in the setter.Also applies to: 9-9
285-289
: Added getter and setter for the new subtree_type property.These methods properly implement access to the new optional subtree type property. The setter uses
std::move
for efficiency when transferring ownership of the string.
299-299
: Added member variable for subtree type.The new
m_subtree_type
member variable is appropriately typed asstd::optional<std::string>
to represent an optional subtree type for column resolution, as required by the PR objective.components/core/src/clp_s/search/QueryRunner.cpp (2)
198-202
:matches_metadata
can be calculated once, but be wary of transitive‐include dependenceThe new logic is clear and correct.
A minor point: this file now depends onconstants::cMetadataSubtreeType
, which is only visible through the transitive inclusion chain (QueryRunner.cpp
→SchemaTree.hpp
→archive_constants.hpp
).
An explicit include improves readability and shields us from future header-ordering regressions.+#include "../archive_constants.hpp"
[ suggest_nitpick ]
205-208
: Repeated metadata-skip guards – consider centralisingThe four almost-identical
if (false == matches_metadata && m_metadata_columns.contains(...))
checks introduce duplication and make future behavioural tweaks error-prone.A small helper keeps intent crystal-clear:
+auto should_skip_metadata = [&](int32_t column_id) -> bool { + return false == matches_metadata && m_metadata_columns.contains(column_id); +}; … - if (false == matches_metadata && m_metadata_columns.contains(entry.first)) { - continue; - } + if (should_skip_metadata(entry.first)) { + continue; + }This removes ~12 redundant lines without affecting performance.
[ suggest_optional_refactor ]Also applies to: 217-220, 228-231, 239-242
components/core/src/clp_s/SchemaTree.cpp (2)
47-52
: String-view keys may dangle afterm_nodes
reallocation
m_namespace_and_type_to_subtree_id
storesstd::string_view
s that aliasnode.get_key_name()
insidem_nodes
(astd::vector
).
Wheneverm_nodes
grows past its current capacity, reallocation invalidates all earlier pointers, leaving the map with dangling views and undefined behaviour.Two safe alternatives:
- m_namespace_and_type_to_subtree_id.emplace( - std::make_pair(node.get_key_name(), type), - node_id - ); + // Make a owning copy of the key to keep the map stable. + m_namespace_and_type_to_subtree_id.emplace( + std::make_pair(std::string{node.get_key_name()}, type), + node_id);or pre-reserve
m_nodes
capacity and document the guarantee.Given the subtlety, I recommend the first approach.
[ raise_critical_issue ]
64-73
: Compact, idiomatic accessor – looks good
get_subtree_for_namespace_and_type
cleanly wraps the lookup and sentinel value, improving readability. No concerns here.
[ approve_code_changes ]components/core/src/clp_s/search/SchemaMatch.cpp (2)
43-52
: Gracefully handle unknownsubtree_type
strings
get_node_type_for_subtree_type
returnsNodeType::Unknown
for unrecognised strings.
Down-stream, anUnknown
lookup inevitably fails, silently causing the column to be unmapped and the expression pruned. Consider surfacing an explicit error (e.g., via status return or logged warning) so users know they passed an unsupported value rather than getting an empty result set.[ request_verification ]
185-206
: Short-circuit once a matching column is foundInside the two resolution paths we continue iterating even after
matched
becomestrue
. With wide schemas this wastes work.-resolve_against_subtree(root_node); +if (false == matched) { // stop once we have at least one hit + resolve_against_subtree(root_node); +}Same optimisation applies inside the
for (get_subtrees)
loop. This is an easy perf win without altering semantics.
[ suggest_optional_refactor ]components/core/tests/test-clp_s-search.cpp (6)
16-27
: Well-organized imports to support new functionality.The added imports properly support the new AST creation capabilities needed for testing subtree type-specific queries.
46-46
: Good addition of a utility function for test setup.This function declaration appropriately sets up the creation of a complex test expression that will target the metadata subtree.
50-54
: Good refactoring to support direct AST expression testing.This overloaded
search()
function allows testing with both string queries and programmatically constructed AST expressions, which is valuable for testing the new subtree type feature.
154-156
: Good refactoring to simplify code maintenance.The original
search()
function now properly delegates to the new implementation, avoiding code duplication.
158-162
: Well-structured function signature that maintains consistency.The overloaded
search()
function maintains the same parameter order and naming conventions as the original, making it easy to understand and maintain.
243-245
: Effective test case that validates the core functionality.This test properly verifies that:
- The expression creation succeeds without exceptions
- The search with the metadata-specific expression returns only the expected result (index 0)
This validates that subtree type filtering works correctly in the search implementation.
components/core/src/clp_s/SchemaTree.hpp (6)
15-15
: Proper include for constants needed by the implementation.Added the include for archive_constants.hpp which contains necessary constants for subtree types.
147-149
: Good refactoring to use the new generalized method.The function now delegates to the new
get_subtree_for_namespace_and_type
method, maintaining the same behavior while supporting the unified subtree management approach.
157-172
: Well-designed API for unified subtree management.The new methods provide a clean, consistent interface for accessing subtrees by namespace and type:
get_subtree_for_namespace_and_type
provides low-level accessget_subtrees
offers access to the full collectionThis design properly encapsulates the implementation details while providing the necessary functionality.
178-180
: Consistent refactoring to use the new API.The
get_metadata_subtree_node_id
method now correctly delegates to the new generalized method, maintaining the same behavior while leveraging the unified implementation.
195-199
: Properly updated clear method to handle new data structure.The
clear
method now correctly clears the newm_namespace_and_type_to_subtree_id
map, preventing potential memory leaks or stale data issues.
218-219
: Well-chosen data structure for unified subtree management.Using
absl::btree_map
with astd::pair<std::string_view, NodeType>
key provides:
- Efficient lookup by both namespace and type
- Ordered iteration capabilities (unlike flat_hash_map)
- Memory efficiency by using string_view rather than std::string
This is an excellent choice for the unified storage of subtree IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good PR overall! Just have a few questions about set_subtree_type
and the unit tests.
|
||
auto get_subtree_type() const -> std::optional<std::string> const& { return m_subtree_type; } | ||
|
||
auto set_subtree_type(std::optional<std::string> subtree_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subtree type is set only in the unit test? I wonder what's the future use of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed it in the thread last week, but you can also just take a look at this part of the follow-up PR.
It allows us to transform queries against the metadata range index into queries against the log_event_idx
column from the metadata subtree in that transformation pass.
m_namespace = descriptor_namespace; | ||
} | ||
|
||
auto get_subtree_type() const -> std::optional<std::string> const& { return m_subtree_type; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of get_node_type_for_subtree_type
and have a method get_subtree_node_type
instead?
{std::string{clp_s::constants::cLogEventIdxName}}, | ||
std::string{clp_s::constants::cDefaultNamespace} | ||
); | ||
auto object_subtree_non_matching_idx = non_matching_idx->copy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change idx to column? I personally prefer something like column
, column_with_object_subtree_type
, column_with_metadata_subtree_type
return (tests_dir / get_test_input_path_relative_to_tests_dir()).string(); | ||
} | ||
|
||
auto create_first_record_match_metadata_query() -> std::shared_ptr<clp_s::search::ast::Expression> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the point of this test. Seems that we create an OR expression with three operands. And one operand matches a record. But if so, what's the use of other two operands? Or can we create another two test cases with the other two operands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tests that column resolution is treating subtree type as expected in all cases. Each column has name "log_event_idx" and different subtree type. The column that matches against 0 has subtree type "metadata" and both columns matching against 1 have "object" and no subtree type respectively. If column resolution respects subtree type correctly then only the filter against 0 should get resolved and the only matching record should be record 0, but if either of the two other columns are erroneously resolved then the results will incorrectly also contain record 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see your point. It's a compact way to test three filters in one expression
auto zero_literal = clp_s::search::ast::Integral::create_from_int(0); | ||
auto one_literal = clp_s::search::ast::Integral::create_from_int(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to create two literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment.
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on @wraymo's review.
Description
This PR adds support to the clp-s AST for specifying search against specific subtrees of the MPT. The immediate use-case is to allow search on the
log_event_idx
column in the metadata subtree.Within
ColumnDescriptor
we now have asubtree_type
string property wrapped in anoptional
. The associated search semantics are:subtree_type
is not set then resolve the column against every subtree with matching namespace except theNodeType::Metadata
subtreesubtree_type
is set then resolve only against the subtree with matching namespace and typeThe current supported mappings for
subtree_type
are"object" ->
NodeType::Objectand
"metadata"->
NodeType::Metadata`.We modify
SchemaMatch
to implement these semantics during column resolution andQueryRunner
to make dynamically expanded wildcards respectsubtree_type
. Note that this dynamic expansion runs into some existing limitations with how we expand wildcards: see #907.Finally we add some search tests which check that we search the metadata subtree iff
subtree_type
ismetadata
.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests