Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic programming approach for schema search. #448

Draft
wants to merge 283 commits into
base: main
Choose a base branch
from

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Jun 17, 2024

Description

  1. Use dynamic programming approach to efficiently search archives compressed using Log Surgeon's schema.
  • Dynamic programming approach determines all search query substrings that can be variables and uses these variables to build all possible logtypes that can match the search query. Logtype + variable combinations form subqueries used to search the archive.
  • Add class for storing each subquery.
  • Move heuristic only logic into heuristic case.
  1. Handle archive writing case if in the future we ever decide to swap from a timestamped to non-timestamped log event.
  2. StringReader class fixes.
  • Correctly reset m_pos to 0 when closing.
  • Fix naming and initialization of member variables.
  1. Remove unused code.

Validation performed

  1. CLP test: tests logs generate decoded archives that match the expected ground truth.
  2. CLT test: Correctly compresses and searches Hadoop 258GB dataset using CLP paper queries.

Summary by CodeRabbit

  • New Features

    • Introduced enhanced query interpretation and wildcard expression handling capabilities.
    • Added functionality for processing and validating wildcard expressions in queries.
    • Implemented a method to check if variable sequences match subqueries.
  • Bug Fixes

    • Improved handling of lexers and query processing logic to streamline operations and reduce complexity.
  • Tests

    • Expanded test coverage for the Grep class, including validation of wildcard expressions and interpretations.
  • Documentation

    • Updated documentation for new classes and methods related to query interpretation and wildcard expressions.

…reviously causing a crash in log_surgeon::Buffer::read(); fixed unit test for failing to find a file
… causing the heuristic to not store variable segment indicies correctly
… with schema; Ideally should use a set, but its not currently initialized
…ns_for_whole_wildcard_expr; Rename possible_substr_types to interpretations.
…tended_search_string_view -> extended_wildcard_expr.
…riable-type variables to differentiate ID and name.
… returns to reduce indentation and complexity; Edit some comments.
…dcard expressions that match encodable-variable schemas.
coderabbitai[bot]

This comment was marked as outdated.

…eadability of errors when unit-test fails; Move variable_type_name to more relevent location; Rename method to compare_log_types_with_expected.
coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@SharafMohamed SharafMohamed marked this pull request as draft October 7, 2024 16:56
@y-scope y-scope deleted a comment from coderabbitai bot Oct 7, 2024
…tions_for_whole_wildcard_expr; Add notes explaining why ?* interpretations don't have all possible variable types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants