Skip to content

Conversation

gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented May 15, 2025

Description

This PR adds support to the clp-s AST for specifying search against specific subtrees of the MPT. The immediate use-case is to allow search on the log_event_idx column in the metadata subtree.

Within ColumnDescriptor we now have a subtree_type string property wrapped in an optional. The associated search semantics are:

  1. If subtree_type is not set then resolve the column against every subtree with matching namespace except the NodeType::Metadata subtree
  2. If subtree_type is set then resolve only against the subtree with matching namespace and type

The current supported mappings for subtree_type are "object" -> NodeType::Objectand"metadata"->NodeType::Metadata`.

We modify SchemaMatch to implement these semantics during column resolution and QueryRunner to make dynamically expanded wildcards respect subtree_type. Note that this dynamic expansion runs into some existing limitations with how we expand wildcards: see #907.

Finally we add some search tests which check that we search the metadata subtree iff subtree_type is metadata.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added test for search against metadata subtree

Summary by CodeRabbit

  • New Features

    • Added support for specifying and resolving columns against multiple subtree types, including metadata, in search queries.
    • Introduced new constants for subtree types to improve clarity in configuration and output.
  • Improvements

    • Enhanced the detail and structure of column descriptor output for better debugging and readability.
    • Unified and streamlined internal management of subtree identifiers for improved consistency.
    • Improved safety and const-correctness in node access during schema operations.
  • Bug Fixes

    • Prevented wildcard filters from being incorrectly applied to metadata columns unless explicitly intended.
  • Tests

    • Added new tests for metadata-aware queries and direct AST-based search validation.
    • Extended existing tests to verify correct handling of subtree type properties in parsed queries.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners May 15, 2025 18:07
Copy link
Contributor

coderabbitai bot commented May 15, 2025

Walkthrough

This change unifies the management of subtree IDs in the SchemaTree class by replacing separate storage for object and metadata subtrees with a single map keyed by (namespace, node type). It introduces optional subtree type support in column descriptors, generalizes column mapping logic, updates related search and query runner logic, and extends tests accordingly.

Changes

File(s) Change Summary
components/core/src/clp_s/SchemaTree.cpp
components/core/src/clp_s/SchemaTree.hpp
Refactored subtree ID management: replaced separate object and metadata subtree members with a unified map keyed by (namespace, node type). Added accessor methods, updated related logic, and improved const-correctness and safety.
components/core/src/clp_s/archive_constants.hpp Added constants for object and metadata subtree types.
components/core/src/clp_s/search/QueryRunner.cpp Updated initialization and wildcard filter logic to respect metadata subtree types, skipping metadata columns unless explicitly targeted.
components/core/src/clp_s/search/SchemaMatch.cpp Generalized column mapping to support multiple subtree types per namespace, resolving columns against all non-metadata subtrees if no subtree type is specified.
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp Enhanced print() to show subtree type and namespace, updated token formatting.
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp Added optional subtree type member to ColumnDescriptor with getter and setter.
components/core/tests/test-clp_s-search.cpp Added tests for AST-based search queries, including those targeting metadata subtree types.
components/core/tests/test-kql.cpp Added assertions verifying that parsed column descriptors do not have a subtree type set in various scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant QueryRunner
    participant SchemaMatch
    participant SchemaTree
    participant ColumnDescriptor

    User->>QueryRunner: Submit search query
    QueryRunner->>SchemaMatch: Request column mapping
    SchemaMatch->>ColumnDescriptor: Check for subtree_type
    alt subtree_type is set
        SchemaMatch->>SchemaTree: get_subtree_for_namespace_and_type(namespace, subtree_type)
        SchemaTree-->>SchemaMatch: Return subtree node ID
        SchemaMatch->>SchemaTree: Access children for mapping
    else subtree_type not set
        SchemaMatch->>SchemaTree: get_subtrees()
        SchemaTree-->>SchemaMatch: Return all subtrees
        SchemaMatch->>SchemaTree: Iterate non-metadata subtrees
    end
    SchemaMatch-->>QueryRunner: Return column mapping
    QueryRunner->>QueryRunner: Evaluate filters (skip metadata columns unless targeted)
    QueryRunner-->>User: Return results
Loading

Suggested reviewers

  • wraymo
  • kirkrodrigues

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62e3e5d and 4b2c21e.

📒 Files selected for processing (4)
  • components/core/src/clp_s/SchemaTree.cpp (2 hunks)
  • components/core/src/clp_s/SchemaTree.hpp (5 hunks)
  • components/core/src/clp_s/search/SchemaMatch.cpp (4 hunks)
  • components/core/tests/test-clp_s-search.cpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/tests/test-clp_s-search.cpp
  • components/core/src/clp_s/search/SchemaMatch.cpp
  • components/core/src/clp_s/SchemaTree.hpp
  • components/core/src/clp_s/SchemaTree.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp_s/search/SchemaMatch.cpp (4)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
  • subtree_type (287-289)
  • subtree_type (287-287)
components/core/src/clp_s/ColumnReader.hpp (1)
  • Unknown (39-39)
components/core/src/clp_s/search/SchemaMatch.hpp (4)
  • column (94-94)
  • column (101-101)
  • column (154-154)
  • column (161-161)
components/core/src/clp_s/Schema.hpp (2)
  • node_type (174-176)
  • node_type (174-174)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (12)
components/core/src/clp_s/SchemaTree.cpp (3)

48-51: Excellent unification of subtree ID management.

The replacement of separate object and metadata subtree ID storage with a unified map keyed by (namespace, type) pairs is a clean architectural improvement that reduces code duplication and provides a consistent interface for all subtree types.


64-71: Well-implemented generic subtree accessor method.

The new get_subtree_node_id method provides a clean, type-safe interface for retrieving subtree IDs by namespace and type, replacing the previous type-specific methods with a more flexible approach.


73-88: Improved safety and const-correctness.

The updates to get_metadata_field_id include several improvements:

  • Method is now properly const-qualified
  • Uses bounds-checked .at() calls instead of direct indexing
  • Leverages the new unified subtree ID retrieval method

These changes enhance both safety and maintainability.

components/core/src/clp_s/search/SchemaMatch.cpp (2)

37-52: Clean and straightforward subtree type mapping.

The helper function provides a clear mapping from subtree type strings to NodeType enums, with appropriate fallback to Unknown for unrecognized types. The implementation correctly maps the expected constants.


179-206: Well-designed subtree type-aware column resolution.

The enhanced column resolution logic correctly handles both scenarios:

  1. When subtree_type is specified: resolves against the specific subtree matching both namespace and type
  2. When subtree_type is not specified: resolves against all non-metadata subtrees with matching namespace

The lambda approach avoids code duplication and provides clean separation of concerns. The logic properly respects the requirement to exclude metadata subtrees by default while allowing explicit targeting when specified.

components/core/tests/test-clp_s-search.cpp (3)

70-102: Comprehensive test for subtree type discrimination.

The function creates a well-designed test expression that validates three scenarios:

  1. A matching filter for the metadata subtree (should match record 0)
  2. A non-matching filter for default subtree type (should not match)
  3. A non-matching filter for object subtree type (should not match)

This thoroughly tests that column resolution correctly respects subtree type specifications and only resolves the metadata column when explicitly targeted.


132-187: Useful addition of direct AST expression testing capability.

The new search overload allows testing with pre-built AST expressions rather than requiring string parsing, providing more precise control over test scenarios. The implementation correctly follows the same search pipeline as the string-based version.


223-225: Good integration test for the new metadata query functionality.

The test case properly exercises the new AST-based search capability with the metadata subtree query, validating that the expected single result is returned.

components/core/src/clp_s/SchemaTree.hpp (4)

148-148: Clean delegation to the unified subtree access method.

The update maintains backward compatibility while leveraging the new unified subtree ID retrieval approach.


166-171: Well-designed generic subtree access interface.

The new get_subtree_node_id method provides a flexible interface for accessing any subtree by namespace and type, while get_subtrees offers direct access to the underlying map for iteration scenarios. Both methods have clear, descriptive signatures and documentation.


177-179: Consistent delegation pattern for metadata subtree access.

The update to use the new unified method with the default namespace constant maintains the same interface while leveraging the improved underlying implementation.


217-218: Proper declaration of the unified subtree ID map.

The btree_map with (string_view, NodeType) key pairs provides efficient storage and lookup for subtree IDs, supporting the enhanced subtree type functionality across the codebase.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47ff53a and ecd6192.

📒 Files selected for processing (9)
  • components/core/src/clp_s/SchemaTree.cpp (2 hunks)
  • components/core/src/clp_s/SchemaTree.hpp (5 hunks)
  • components/core/src/clp_s/archive_constants.hpp (1 hunks)
  • components/core/src/clp_s/search/QueryRunner.cpp (4 hunks)
  • components/core/src/clp_s/search/SchemaMatch.cpp (4 hunks)
  • components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (2 hunks)
  • components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (3 hunks)
  • components/core/tests/test-clp_s-search.cpp (5 hunks)
  • components/core/tests/test-kql.cpp (9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/archive_constants.hpp
  • components/core/tests/test-kql.cpp
  • components/core/src/clp_s/search/ast/ColumnDescriptor.hpp
  • components/core/src/clp_s/search/QueryRunner.cpp
  • components/core/tests/test-clp_s-search.cpp
  • components/core/src/clp_s/search/ast/ColumnDescriptor.cpp
  • components/core/src/clp_s/SchemaTree.cpp
  • components/core/src/clp_s/search/SchemaMatch.cpp
  • components/core/src/clp_s/SchemaTree.hpp
🧬 Code Graph Analysis (3)
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (1)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
  • m_subtree_type (285-285)
  • m_namespace (279-279)
components/core/src/clp_s/SchemaTree.cpp (1)
components/core/src/clp_s/SchemaTree.hpp (8)
  • m_namespace_and_type_to_subtree_id (169-172)
  • get_subtree_for_namespace_and_type (178-180)
  • subtree_namespace (147-149)
  • subtree_namespace (147-147)
  • subtree_namespace (166-167)
  • field_name (157-157)
  • m_nodes (182-182)
  • m_nodes (195-199)
components/core/src/clp_s/search/SchemaMatch.cpp (3)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
  • subtree_type (287-289)
  • subtree_type (287-287)
components/core/src/clp_s/search/SchemaMatch.hpp (4)
  • column (94-94)
  • column (101-101)
  • column (154-154)
  • column (161-161)
components/core/src/clp_s/Schema.hpp (2)
  • node_type (174-176)
  • node_type (174-174)
🪛 Cppcheck (2.10-2)
components/core/src/clp_s/SchemaTree.cpp

[style] 75-75: The function 'get_metadata_field_id' is never used.

(unusedFunction)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (25)
components/core/src/clp_s/archive_constants.hpp (1)

37-38: Constants added to support the new subtree type feature.

The new constants cObjectSubtreeType and cMetadataSubtreeType align well with the PR objective to enable searching against specific MPT subtrees. These string constants define the supported values for the new subtree_type property in ColumnDescriptor.

components/core/tests/test-kql.cpp (1)

54-54: Test assertions verify default behavior for subtree types.

The new assertions ensure that KQL parsing doesn't set subtree_type by default. This is important to maintain backward compatibility and confirm that columns will resolve against non-metadata subtrees by default, as specified in the PR objectives.

Also applies to: 73-73, 93-93, 143-143, 195-195, 225-225, 281-281, 301-301, 325-325

components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (2)

86-89: Updated print method to include subtree type information.

The changes to the print() method properly include the new subtree_type property in the output when it's set, which is helpful for debugging and logging.


98-98: Updated closing delimiter to match new token list format.

The closing delimiter has been updated to ]) to properly match the new tokens list format using square brackets.

components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (3)

5-5: Added necessary includes for new functionality.

Added <optional> and <utility> headers to support the new std::optional member and std::move usage in the setter.

Also applies to: 9-9


285-289: Added getter and setter for the new subtree_type property.

These methods properly implement access to the new optional subtree type property. The setter uses std::move for efficiency when transferring ownership of the string.


299-299: Added member variable for subtree type.

The new m_subtree_type member variable is appropriately typed as std::optional<std::string> to represent an optional subtree type for column resolution, as required by the PR objective.

components/core/src/clp_s/search/QueryRunner.cpp (2)

198-202: matches_metadata can be calculated once, but be wary of transitive‐include dependence

The new logic is clear and correct.
A minor point: this file now depends on constants::cMetadataSubtreeType, which is only visible through the transitive inclusion chain (QueryRunner.cppSchemaTree.hpparchive_constants.hpp).
An explicit include improves readability and shields us from future header-ordering regressions.

+#include "../archive_constants.hpp"

[ suggest_nitpick ]


205-208: Repeated metadata-skip guards – consider centralising

The four almost-identical if (false == matches_metadata && m_metadata_columns.contains(...)) checks introduce duplication and make future behavioural tweaks error-prone.

A small helper keeps intent crystal-clear:

+auto should_skip_metadata = [&](int32_t column_id) -> bool {
+    return false == matches_metadata && m_metadata_columns.contains(column_id);
+};- if (false == matches_metadata && m_metadata_columns.contains(entry.first)) {
-     continue;
- }
+ if (should_skip_metadata(entry.first)) {
+     continue;
+ }

This removes ~12 redundant lines without affecting performance.
[ suggest_optional_refactor ]

Also applies to: 217-220, 228-231, 239-242

components/core/src/clp_s/SchemaTree.cpp (2)

47-52: String-view keys may dangle after m_nodes reallocation

m_namespace_and_type_to_subtree_id stores std::string_views that alias node.get_key_name() inside m_nodes (a std::vector).
Whenever m_nodes grows past its current capacity, reallocation invalidates all earlier pointers, leaving the map with dangling views and undefined behaviour.

Two safe alternatives:

-        m_namespace_and_type_to_subtree_id.emplace(
-                std::make_pair(node.get_key_name(), type),
-                node_id
-        );
+        // Make a owning copy of the key to keep the map stable.
+        m_namespace_and_type_to_subtree_id.emplace(
+                std::make_pair(std::string{node.get_key_name()}, type),
+                node_id);

or pre-reserve m_nodes capacity and document the guarantee.

Given the subtlety, I recommend the first approach.
[ raise_critical_issue ]


64-73: Compact, idiomatic accessor – looks good

get_subtree_for_namespace_and_type cleanly wraps the lookup and sentinel value, improving readability. No concerns here.
[ approve_code_changes ]

components/core/src/clp_s/search/SchemaMatch.cpp (2)

43-52: Gracefully handle unknown subtree_type strings

get_node_type_for_subtree_type returns NodeType::Unknown for unrecognised strings.
Down-stream, an Unknown lookup inevitably fails, silently causing the column to be unmapped and the expression pruned. Consider surfacing an explicit error (e.g., via status return or logged warning) so users know they passed an unsupported value rather than getting an empty result set.

[ request_verification ]


185-206: Short-circuit once a matching column is found

Inside the two resolution paths we continue iterating even after matched becomes true. With wide schemas this wastes work.

-resolve_against_subtree(root_node);
+if (false == matched) {  // stop once we have at least one hit
+    resolve_against_subtree(root_node);
+}

Same optimisation applies inside the for (get_subtrees) loop. This is an easy perf win without altering semantics.
[ suggest_optional_refactor ]

components/core/tests/test-clp_s-search.cpp (6)

16-27: Well-organized imports to support new functionality.

The added imports properly support the new AST creation capabilities needed for testing subtree type-specific queries.


46-46: Good addition of a utility function for test setup.

This function declaration appropriately sets up the creation of a complex test expression that will target the metadata subtree.


50-54: Good refactoring to support direct AST expression testing.

This overloaded search() function allows testing with both string queries and programmatically constructed AST expressions, which is valuable for testing the new subtree type feature.


154-156: Good refactoring to simplify code maintenance.

The original search() function now properly delegates to the new implementation, avoiding code duplication.


158-162: Well-structured function signature that maintains consistency.

The overloaded search() function maintains the same parameter order and naming conventions as the original, making it easy to understand and maintain.


243-245: Effective test case that validates the core functionality.

This test properly verifies that:

  1. The expression creation succeeds without exceptions
  2. The search with the metadata-specific expression returns only the expected result (index 0)

This validates that subtree type filtering works correctly in the search implementation.

components/core/src/clp_s/SchemaTree.hpp (6)

15-15: Proper include for constants needed by the implementation.

Added the include for archive_constants.hpp which contains necessary constants for subtree types.


147-149: Good refactoring to use the new generalized method.

The function now delegates to the new get_subtree_for_namespace_and_type method, maintaining the same behavior while supporting the unified subtree management approach.


157-172: Well-designed API for unified subtree management.

The new methods provide a clean, consistent interface for accessing subtrees by namespace and type:

  • get_subtree_for_namespace_and_type provides low-level access
  • get_subtrees offers access to the full collection

This design properly encapsulates the implementation details while providing the necessary functionality.


178-180: Consistent refactoring to use the new API.

The get_metadata_subtree_node_id method now correctly delegates to the new generalized method, maintaining the same behavior while leveraging the unified implementation.


195-199: Properly updated clear method to handle new data structure.

The clear method now correctly clears the new m_namespace_and_type_to_subtree_id map, preventing potential memory leaks or stale data issues.


218-219: Well-chosen data structure for unified subtree management.

Using absl::btree_map with a std::pair<std::string_view, NodeType> key provides:

  1. Efficient lookup by both namespace and type
  2. Ordered iteration capabilities (unlike flat_hash_map)
  3. Memory efficiency by using string_view rather than std::string

This is an excellent choice for the unified storage of subtree IDs.

Copy link
Member

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good PR overall! Just have a few questions about set_subtree_type and the unit tests.


auto get_subtree_type() const -> std::optional<std::string> const& { return m_subtree_type; }

auto set_subtree_type(std::optional<std::string> subtree_type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subtree type is set only in the unit test? I wonder what's the future use of this method.

Copy link
Contributor Author

@gibber9809 gibber9809 May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it in the thread last week, but you can also just take a look at this part of the follow-up PR.

It allows us to transform queries against the metadata range index into queries against the log_event_idx column from the metadata subtree in that transformation pass.

m_namespace = descriptor_namespace;
}

auto get_subtree_type() const -> std::optional<std::string> const& { return m_subtree_type; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of get_node_type_for_subtree_type and have a method get_subtree_node_type instead?

{std::string{clp_s::constants::cLogEventIdxName}},
std::string{clp_s::constants::cDefaultNamespace}
);
auto object_subtree_non_matching_idx = non_matching_idx->copy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change idx to column? I personally prefer something like column, column_with_object_subtree_type, column_with_metadata_subtree_type

return (tests_dir / get_test_input_path_relative_to_tests_dir()).string();
}

auto create_first_record_match_metadata_query() -> std::shared_ptr<clp_s::search::ast::Expression> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get the point of this test. Seems that we create an OR expression with three operands. And one operand matches a record. But if so, what's the use of other two operands? Or can we create another two test cases with the other two operands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tests that column resolution is treating subtree type as expected in all cases. Each column has name "log_event_idx" and different subtree type. The column that matches against 0 has subtree type "metadata" and both columns matching against 1 have "object" and no subtree type respectively. If column resolution respects subtree type correctly then only the filter against 0 should get resolved and the only matching record should be record 0, but if either of the two other columns are erroneously resolved then the results will incorrectly also contain record 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see your point. It's a compact way to test three filters in one expression

Comment on lines +71 to +72
auto zero_literal = clp_s::search::ast::Integral::create_from_int(0);
auto one_literal = clp_s::search::ast::Integral::create_from_int(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to create two literals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment.

gibber9809 and others added 2 commits May 23, 2025 11:51
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
@gibber9809 gibber9809 requested a review from wraymo May 23, 2025 16:31
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on @wraymo's review.

@kirkrodrigues kirkrodrigues merged commit 3c12c30 into y-scope:main May 26, 2025
13 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants