Skip to content

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Jun 25, 2025

Description

Update log-surgeon to the latest release:

Reason for update:

  • Allows for the use of capture-groups.
  • Several bug-fixes inside log-surgeon.

Code changes:

  • Use new lexer scan API (now returns an optional token, instead of taking a token by reference).
  • Update to match log-surgeon renames:
    • Change all instances of enum class SymbolID to SymbolId, and update corresponding enum names (drop the Id/ID suffix).
    • Drop Regex prefix for ByteNfaState.
    • Change lexer method add_delimiters to set_delimiters.
  • Update timestamps in schema.txt and update equals variable to use capture groups.
  • Remove all instances of reverse_lexer as its no longer supported by log-surgeon. As a result, fall back to grepping all files if search variable has a leading wildcard.
  • Rename all instances of forward_lexer to just lexer.
  • Update is_possible_input to use an std::array in place of a c-style array.

Unit-test changes

  • > is now in the action table of int, so is a suggested character to proceed in the failure test.
  • ? is now used in capture groups, so has to be escaped in the schema.

Taskfile changes

  • Add a task to install microsoft.gsl.
  • Update log-surgeon task to:
    • depend on fmt and microsoft.gsl.
    • disable testing.
    • use the newest version.

Validation performed

  • Updated unit-tests pass.
  • CI passes.

Summary by CodeRabbit

  • New Features

    • Expanded support for a wide variety of timestamp formats in schema configuration.
    • Added Microsoft GSL as a supported dependency.
  • Bug Fixes

    • Improved handling of delimiters and special characters in schema files.
    • Corrected error messages and token end symbol usage in schema parser tests.
    • Added runtime error handling for timestamp encoding failures.
  • Refactor

    • Updated naming conventions for token type enums to improve clarity.
    • Streamlined lexer loading by removing unnecessary boolean parameters and simplifying function signatures.
    • Consolidated lexer usage from separate forward and reverse lexers to a single lexer instance across core components.
    • Improved internal data structures for lexer utilities.
  • Chores

    • Updated dependency management configuration and library versions.

@SharafMohamed SharafMohamed changed the title build: Update to the latest version of log-surgeon/ build: Update to the latest version of log-surgeon. Jun 25, 2025
Copy link
Contributor

coderabbitai bot commented Jun 25, 2025

## Walkthrough

The changes update schema configuration regexes, refactor lexer loading to remove direction flags, and update enum names for token types. Dependency management now includes Microsoft GSL. Tests and application code are updated to match the new lexer interface and enums, with no changes to public APIs except for the `load_lexer_from_file` signature.

## Changes

| File(s)                                                                                   | Change Summary                                                                                                                        |
|-------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| components/core/config/schemas.txt                                                        | Expanded timestamp regex patterns, reordered and refined dictionary variable regexes, updated delimiters line.                        |
| components/core/src/clp/Grep.cpp                                                          | Updated token scanning logic to new lexer interface, corrected enum names, improved error checking, consolidated lexer parameters.    |
| components/core/src/clp/Utils.cpp, components/core/src/clp/Utils.hpp                      | Removed direction flag from `load_lexer_from_file`, updated data structures, enum names, and documentation.                          |
| components/core/src/clp/streaming_archive/writer/Archive.cpp                              | Updated enum type and member names for token types, added runtime error throw on timestamp pattern failure.                          |
| components/core/src/clp/clg/clg.cpp, components/core/src/clp/clo/clo.cpp                  | Removed boolean direction argument from `load_lexer_from_file` calls, consolidated lexer usage to single instance.                    |
| components/core/tests/test-ParserWithUserSchema.cpp, components/core/tests/test-Grep.cpp  | Updated lexer loading and token scanning to new interfaces, corrected symbol constants, updated error message in one test.            |
| components/core/tests/test_schema_files/search_schema.txt                                 | Escaped question mark in delimiters line.                                                                                            |
| taskfiles/deps/main.yaml                                                                  | Added Microsoft GSL dependency and installation task, updated log-surgeon dependency, and extended CMake arguments for dependencies.  |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant MainApp
    participant Utils
    participant Lexer

    MainApp->>Utils: load_lexer_from_file(schema_file_path, lexer)
    Utils->>Lexer: set_delimiters(...)
    Utils->>Lexer: generate()
    Utils-->>MainApp: (lexer ready)
sequenceDiagram
    participant Test
    participant Utils
    participant Lexer

    Test->>Utils: load_lexer_from_file(schema_file_path, lexer)
    Utils->>Lexer: set_delimiters(...)
    Utils->>Lexer: generate()
    Utils-->>Test: (lexer ready)
    loop Token Scanning
        Test->>Lexer: scan(input)
        Lexer-->>Test: (error_code, optional_token)
    end
Loading

Estimated code review effort

3 (120 minutes)

Suggested reviewers

  • kirkrodrigues

<!-- walkthrough_end -->


---

<details>
<summary>📜 Recent review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: ASSERTIVE**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between dd5a8a4129649ecb61dc874fd0a738cc30009950 and 922053be5c86349c7e3662b471c4d18f37da0c1b.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `components/core/config/schemas.txt` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.


Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like (?:\d{1,2}| \d) are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like [ 0-9]{2} that allow character classes with spaces are valid in log-surgeon schemas.


</details>
<details>
<summary>components/core/config/schemas.txt (5)</summary>

Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.

Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the `.` character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like `.*` in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

</details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>components/core/config/schemas.txt</summary>

[uncategorized] ~65-~65: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...\*,:;\<=\>\?@\[\]\^_`\{\|\}~  // === // Specially-encoded variables (using the `int` and `float` ...

(HYPHENATED_LY_ADVERB_ADJECTIVE)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)</summary>

* GitHub Check: build-macos (macos-15, false)
* GitHub Check: build-macos (macos-14, true)
* GitHub Check: build-macos (macos-15, true)
* GitHub Check: build-macos (macos-14, false)
* GitHub Check: centos-stream-9-static-linked-bins
* GitHub Check: centos-stream-9-dynamic-linked-bins
* GitHub Check: ubuntu-jammy-static-linked-bins
* GitHub Check: ubuntu-jammy-dynamic-linked-bins
* GitHub Check: ubuntu-jammy-lint
* GitHub Check: lint-check (macos-latest)

</details>

<details>
<summary>🔇 Additional comments (1)</summary><blockquote>

<details>
<summary>components/core/config/schemas.txt (1)</summary>

`16-16`: **Verify that leading “<<<” prefix is portable**

The sub-pattern `(\<{3}|\[){0,1}` relies on `\<` escaping `<`.  
Please double-check that log-surgeon interprets `\<` as a literal `<` and not as word-boundary syntax (as in some flavours), and that `{3}` repetition works after an escaped token. A quick grep through integration tests would confirm.

</details>

</blockquote></details>

</details>
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNxU3bABsvkCiQBHbGlcABpIcVwvOkgAIgFseC9aLgBVblpqEgj8CNhsryzEXEgpCkR4fCx8ADNIL3wiMAcKUirQ2MgAdzRkBwFmdRp6OTzs7ERKSABlWGc0GoBZfDm2el7rOwxHASmARgAGAGYjjRh8yGwMrMvuIiolZFwLhqaWtqwCfF8CFFxkQo0Yp+EjRXrZX5kFTRMaXSb8OoMNDcXDYfzECj4K7ITD0eAYJgUbj4Kg0ewkMpoXwJIiQGrwAAe0g0RgA0iR5EwlIw5hhSMh8QwvNhuWhaPgUfjac9suQuvUSEy+AADRBIjDKyAAQSsAElwl0EAxYJAMPh5f5URQMDjqpKqlScgBrMgoG00MUIiJoJ1S9DO12jfw1ShkBgkM5anxu4qYcPIWqwsiORiFRDIZXTWTMATfXUAEU1XVDIIwaDWOUgmezua8utoyoN6hNhP8iGJGFofuTzFN5ekt0yw0go1omO43D9Mqr9eVAHplQXNQ4avSGWc4NllTYSKQGZreCQ190Nv5mPgpPQapje8qAEKyGgAORqaGmuCyjfQndh0SVkDYZ5FCrMVaAAfSULx4EGGhyiLU8oQrX5VRIXAINBaD1EoRBlUjaN8VjAkB0TZV/DKSYwL/SgixLM8LxiDZ1BQZAzXqKpSD4BwJxJYYAG4FBtQJggwKJZHCacuWyV8fBkNAGCdSt7hICc/SpXx6WiAU6mnSZnGNUpnHgaFsjmHEFTFP0uiSWgkQoWhI1oLtxAdHwxPQfD3TjYi6mVGoSR6OzKMVajuloxCYmQqiKFw84tyQMDiXTeABGiMD8U8XBNRqbACWcrAemQK4hwi3IJmyTAq2KFIOGcKhZE1AiPXoRM0EgABhZpcFkGFarQWQWQMVIMCYoESmNTB+RBGpojEPJqCrPQGpY803SFEUYnxWE5Ly70UuyEj8Uy788TMhwiH5YdxqoMQpl+Xh8HDDasFfJI0QhEJEHCXEqwAfiW00VrKvEsCRFE3sge4sW4T6QSCeAKD9Ebcl2SBpFBp7YTVfJmDQAboF6J0NPGa4gVW4VRUc1TvUQBTfkatSq0GBhMUQWpNCIRAvC/IrqCnC5lVeZo0Q+TUPxpyslG4MhmqwXzmCO77lSZlm2Y0DmufCLtEGhKcQj9Py+GeVAxadL6fwmPnZRIYtgXIypqjqQX3hIKoBoANSpeAh3tk9kCligDYrIMcuGvlByyegcpGj7joE8RtixAURN3Ul7fCXNni9Q14H07hekmegHAYeNEGy1y8YuTxo38IIQkYCqUbI+BrYYzhIFgXBcGhjg5znIhm2wAQNCYZg51kZomCluchW4Ocq68OdDhOAb9GMcAoGlr00DwQhSHIUkYhHtgRK4Xh+GEURxCkGROUUZRVHULQdFXkwoHcAVbRwAhiDIZRLpYY+bcqDygcE4FwI5b5KCoA/TQ2hdBgEMGvUwBgR4dnYIgaeJISCYIwPSIgc4sYkBxogDQuAGScAMLEKhBgLDal1D/feEd7COBxuAxM40+TSDcBcLsq5UbkLIBUKQXh5CKjzp2J4/NxBsFjMwbgmp/B7kgEoekw08o3xBNwQoDAqYhhAdBLREJoIhHLNwHg1BYI2m6M2f0AhMSekmCURMSgPxJBiIo4KyAmBlCplZbkEhDKoXkImaRJi5F0hJDjf4G58jwjlOYzulArFcWJBQEozBvDiEMcom4nhcrYF5lUZAAAKWAshuD5AwOETmvRYDhBJADcgABKTWWQwChPJHnUkJISntjkiQcIZSKlkHqXwAA5NAMZLT+D2jLL4MMDQKhh0QDWb4JTDxrgHFBF0cRgB7M6A02IABtTo30VybIBPAHZsQAC6sRpnTUVMlGEsAsR8C7P3EogdqAw2+oeHRFQqgKmEcga8LAIjsFgE8XIyt8CTCYBI6xmdeEhn8CJTp8wCDlEgMUo+aBNb4DCAoBoGAmkOScvbR0oTZFmO+f8Mm61IC6mmAAeUgAADgAGwHD2AZBGmB/jhC1HnY0JBOrdQKI0CIxiaXICspnOxckXTDmKjDalH45FypsSoOxFIjLDgaWXXw54RImjLDIoViw5xWEWADD8eUzbNRlHwI+QLXjSpkRq2lkSflIpNAElwfpIKYVgraeg/ykD20jBig+5IxA+1QKBGIBsPVhOhowNEaLRIASQEs2k7rcw5UyOAlJPFwhQWKKpSmeVHSus+DKr1II9zMm4VuYNMFsIKN3IqBJljmLh2HHTAk5MIQXDGQAXjGTyeYN0+DfWnGjZEMQxnACnd9MZegp1XW2thcIYjcSW3JE4uo/gmBEGGgALxiO2rCLq5jXVDTErcQkqQ4SbT2vOiTrT9vPLw5u9BfhlX9Oaw+yIrTZEhlcKsxSfrAACV4PQGgkNNKyg06c8HgjoBqLBJMQRX32HgOe2OqB/AkigQxXMUhYTKlMk+HYIVP2WKfXSNx/bSOtEwPAK99B5UmiTfQI+7ARwNHksgCQSACmuRjbzZZl97agrQw20xMMb2huqVLHRalx5hjvvQANRk9q/J/F2eNDpwH6eMiQyAT5ciEvyHwbKuVKW+A4ZNZw2QcZKAGrQqMsFClWN+NOJQQpMXya9GIniMQGmeBSjnVGIl1DN0QEYKAixMnwGyfE5U6rTFdqUYxpJOJHKH3ogjMOXZyJGM9aYiJFAolJyrKgqo6DMH+BwXgghoriGkPIcqAwUAdz5YsUkgddAuDKlU523tI3XPJpvFWAAOrgBbFAFsYBALEMwAASAApAAMjGQt4pC2mkLYAFShA4LxYAegfoAAEFtHIWzchbAA9MCyoFsAG8FsAB8FsAF8AB+otcifeW6t9bm3dsHaOyd87l3eILeAOOhbegFv3ce89t7H3vt/cByDt0jWWBoJEhgwk2CEUdcId1shmV+uQEGx+4b36ebDHGy+rwb6Cvftm1eebuEzvjo0Gdo5aAwAXq1GAAAWgcMAABOG5IvQdVhF+OmDcGqSIdF+LyXMu5eK5Fyh2O7HyPrEo1uWj9HopE+VE18gZPWuU6qNTrrvQev04MAAUUrVEkr3Im7W1RquHiXBFh0HgI4Sh1D+vIPty1inBCKAMGnl4WeABxfw3Bh4Tg4NH2INDLBanoXvP+MRQGsOCYiXk/JW3Ts4YVYaNR5DTiinCNApBbeZ+UhwDg90S5gWAWBWuLhNSK279wXvpA0KFokWBWoYFyDkISoS9gRkvBgQDZqIMyltF+kmF0m4vl/LOHAlFMfP5SIUmwiQIKSoDzzEAthP1/o80wgFsFG3h+n8UDwi5tSYaU0oYRE0KVYBsAUZ+n+F+9AV+lWd+IU7mmi2iMQvGVY5+Zwuoss0+YEs+tAiA8+NQi+ioaExINACWVIm+zgX4rejQcWKaaomAoc0o+ALoViqBVkyQtkeB/AfAvGWIJQHBNkp+H8BGciUE9IJUcI+0GAIiaBUBzGDBGATBXoBArBWqQEeA1inBwhAM8oZUGYUUGgihxSXSkwFAaUGAGUOB2Aq4lAxu2c+kloaIViFUlAmILqd8scqhroqIhi/E8AdQihyhL0XO4kFwjmpmWATh1oGY0kkwMU6QQKRcJcX0poQemYJAeksA0ALBZA/0CKxQFA2AN0/O4K043hGAYyYmVIwQzGkkYwWABElA9K049I5QJQXUUsTK+YdIAuukyesAYEFRGgzAQx5St+Xs+BKINuh0YO/RxoQxuRGAIxYxUsaUeBYEjiMUm4AkFKQKvwrCCkrUFR6AZkFme0/abOMQO+544mYcEk+Q8ktWzCq4jIWhQhdkCgOY+IKBNiGybxghXBxG7omRMsshoq8k+sDSgJp+RO04gwjkMI0WwYAJ1kXBZwXu2wvY30BRH46KIGCmhsSxEQ4xOI/go2pRt4rwmxwsLsGAveWYOYeY+YveORrBQ0SI2ARAHc74ZWRAS4scyobJZAmJtAApkUjQNJrQdJDJqydY1UwpGAHJ283JuAvJUo0BVYipopX4wYM08ctIrU5qforYcmrssUXxgCdAYA/BCg3IKaEJvo9xFwnR2QBYfQqEbGRC9E9k2o0YbebYqEgBihfsTcicsh4o5A0mNAshKa4BsJ30ZEN+CoSoZJ2QBR3wXsTCgG8IOkUo7+5+MYeJ4Y3mReXgfm6ilYQWoghQqcRS4WDIqSBqfAMWUEDA8WkQSWKWkAAAYjlJEQRuetQODFcSkJAJAAYOOaqLgNVBKDWl4MAAAIrBAuB6CQAT69797SD4FD4j6yDFJagDGSDplFK4B7bVK4B8kCTFDnmozEjGihJDF7r3kdzGJPkjj4DfDlqSnOxVC95RSIC94Pg0AAAyn+t5uY3wxuE545fOvRZRPoU4XQuQAFOKx+FAEB8B0U4QsBN+WFyoTSlYrUb+W46B3ZfZTmSRhGZY4GFJXAMFVYkFvgG5HA2BuB+BC+S+JBq+5BG+AaxShRJpp5t5FQV6QxIlXGt+Z56cn5Xgt51Jv59JHAAFQFj4JAYFSoe20Fk59epAlJ3oTp0oyFKZz+xS6FmF5+OFSZ5Qt+5+hFvwxF+ZpF8h+eKWceJOzWjuieiAyeqes8qQ4gXOOek+rlPmJev8B8hcLCzgVeulXCBgOxERO0As+AYoWFYEYKoxRM8EyAv6AR/6ECIItxh6uFNl2+slmRWA3+qEUw306gOI6YD0+qJWnYiWLkbE/c7Zzw80SIPgMQJAzYUwH+SoasEVWQg+1+NlxSxuDSw1lAo1jCNA01MUNmXQYRUk/ZO0/ZVQexcy4JABchI1peB8y1zGPYQBaK8YzxKyTJvg7poU5Jo58FVJP5tJf5HAjJtY4pYOClb1Sln1eYDY4QepcmYcrUaK/YPBU020DSISFwPYzGgErykcBcRFjkyiGEHa2KgWLwn+cV+lyooE6EUEWNb6yEjixNIanazGmytAzQkl7U4qMITFlV6AFAdUjFslzEK+SUe0FhGURyv10p71DAQ0Ocd8iwaADINyOVSB/SPG2qlUM5vevUsgwATF35RAUpHwveDA0wklLKNQYtkki0boiWns3GL+Ki285ZBkwozIFpNAYhuSj+NVDms1TODIWo0w0AIFt6VIstT1mVVYntT4PZWowFJA74n4lY94alL4b49qJAMUWo1azmIi61lw2wqNImjowWtZ/mXomYbuVgzgZh9g8ivsRVPpT68Is2yA4odq7kOGEkWCccmI6kDQ8oDSbhDSvItAUEYcCQJQVxfYgwfIrSH49gl5xRtFEwHeAyJu3pVGggrM0QZIDSOUQMdp2Qn6UKpZfpFZYWON2Q+doW9ZiYEWaSUWLZg8bZHZiW8VUAFFA5FQQ5c9JMY245OlyoEg+AXsbEaVUUGVN4GVbiAlytHAglYcuJe29gbuYDqUe9Ml3wIIlWmt2tMpyln+gFHAkdGllAcDUU2l45gASYTf1Tl/0AMNBAOf4gMsCIMkAQPVTQO0iwPwPYxoCMMJTUB1IdWYPvUqV4NqUEMUBEOf4kNE4EkipbgzwNSyzx5eVYJJ4p4zxzgBVJAkIMATh9ZUIF6x5gBGCKP/BO4qN+UEKXmZHj1a1ZHHlzhdAIywRziHnGjHnBV556OF50IMJl5RVgKxX111711uiXmKDFHlR+A5QdKOlwn8wONYRgTMCIBa0Wx8ibFu6ahI3ARw1bguMIBSCaghbpiVg+LN5zQdEulKbhI87PRYg/ijBClVPcCl1fpKXzGDFOhmhdAYBDFTEs42j/RmglDKjbA+DTExS6h1BMSoAjPeBp6XlfitTTnVRFEJZsBgS90zGSKYjdMv6tQuLaDRD0AyLayd6CjkyHo5RRKioAZNPT18nkptV7VuQMzBjAFXUn3xaOB/w+wpVa2KWym3UCl1X0psA5imURhEAaA4WKnKlck8mXlShLgwtLFPjWyD0kCzgousGYG4BYtalLE9k0N4sNiEWIFZ7IEK2Zx/MCP/Vymzixymntg7V+hgu7DYp8GaHlFUBJB+ixAFgHJ8D8u0CdDnJvHFKQvQsEvskEgqkIt8lfhCmovos/GKuKm4tquEvEsEW13ZDnVstTAUv9KAIJjFxohE4Iq7W1qPFOjBnyrGhgBIjwixhO3oLlPty4gwgnFblJR8jMalYMyvBxY4lVChOd0rRnjaAgxFJIBkElCKjhgojPHTj8aozs192et+j7F3M1N0iHPgzzrw1YkARELsuljlhSgH2+Y/P1mfNn11lWKX2NmRbNS32xbtlr7iDxU2bkADQ+7SJML1GB7yhHgGxtzh5dhR6eMGNGMeUO4mOJ6jQYKjRgCl02UUAADqzYqQZh0wbu7joVRe4Vi15e0VbC1eE08VOxo0LG7+K7a7ZhW7zwO7lAe7nDwVgdn9AGuQYibZ6gsh5zjK04xdnDD781sAOj7coJ81FpSVvzx141NOXDvQR0BUAEig+VkhvVPwFwLrcWWTMBl5sg6TnDjDmoXYp62aQKNLALH1JdZdIUnLY0/gMmhpbEvVRZXkGBnwLpdcTrUkDSrUgwvrbD3wQKm0Okbut7i9C6jZl8/V6bfAJzC9VdT1Q6a0ooWAoETzjoYi8n37rBVYG6Yy8jSYin9zlbdesQ0Adc8ZnxUUpyP41ndc1l8IDnEKwI/HcqJYGZUExUwMDlBGfIMIznwI7nN7/HdRDMEptDSo9DWVbistFQYhmHIw8gdEdxbHLNFUzgRA3z6Kfy+ch6beggIgs0FHl8IizGJxQRfoQb7ZaHcFOZ2QhRs9b0Iw+IXYfIoBOUIqtMFw0R5A4a2gDmfR6o8jDlWAmzO9scFUc5zmAYVSctOizpu9YZEw0236iYecIn/oJxO+qKYYEYjtxJqAXT15M9JRz1sI837V+3R4bd7AlAxXn5Zi30T1yXSQzgAHNoXso6kqEo1XxJm8N1tYF3AqVdppV3wd1YQLLJHA2pnY31VYAN8prJSxOpurcVOIths0rdIkHddIXdKhxJtXQ98gYoYGU4uQRp6RihE3lAr44YIIVoyhrUec8Mezdo85i3fgvDt0vI6HvCsgVMJxuX+XJQm0Wi/SA0Nm/AzqdIm1C3wTDSb9NF4MeV9ISIlZxY5Jh0mItA4T9kRg5gZZR9tbuQ1ZIWDbCYdQV9zZPAd9cWnbXZBgPbEYrlM7KCc7CeyjS7c4o0JHRCXDRMGC7TQfOMHuHj1CYVPjkVzC/jXogTCVFws2qMDBUski2QtclaQKhx06D6UwEnPCmNt6lyUZO+0vK3tIyof0L+n2f0MSqAmvSWsIk2nEGm+V2vPsgWLHFT2fwQufWA+fGwrUUEfmLm96O6fApI9mc0WnFdog6+76DIkAeGCWEhfAsN/s1AJIA0pvh9NbAWlvJf1v/mtv/CTZN9jv7bD9XbyWbvzWnvEAhjBgJsofc4ksGCOM+IGgsg5YXgaPgXlj4IdhwFeGKknxrxXseEAROoKBD6BXAmyybC4IsBzgqxsM65aYCBXqDJQqA4CYvqfWUjSwww8gHGGWFICAIBIeCNEP5gwKS8CeYTK6rT3lDnF386eMCOniwFgQQKuoO8GBCfBahFgXuZcJ6V+CxBUBzMOFKrE4EgVOgRaKYFBEVTgICSzGO3FgjABqQwAh+PqlzGphOgqiGNKWJ2DDCt9ECxBYgd+wZSig0i8oQ6EkipQExGYaAqQdhjVicxcIRgLUDYL0FODJBrMVwerH+gqIfi6wJOJYkdCJkco/AIiHQKkJZ9+EsbP0HgF5ZdQqwyQrnBwHpg+AwAdEGgI6xxgugwAiglXOKG6Y0N1gP4LIb4AkHoCSgMg0oNhB9gAAWDQAcFaHXd6q65dQAAAlB4IIMELmWcACA1ITYTOI6VAQNCEYWvWgRaTaiS0dkoAn2OLxNaL8dEZTVqDuAGHZAEg1kEklLHLRoAL08gKoSW3TAL0jM9ALWMZEwHYCl2FcUio0CFjC0NQPg1AGp1yCSxN4QKDOCaDlgKwfw04LLLCn8Hsx3BegugcgDmE+gIMY1HaMsLdbndAO3IKnIRhoE7RQ+zxH4WAXliakahLgzQDIMdQecDS/aK4XtD0xGQqwYAfMELQ+BgQ7wqQXUCBXzBgRoAXuH2rqCfDp5x0LKHsj2RTpp0XIGdWEB+AoDDDowqQGwNgO+jTBuhWoAAEwABWTlDyFEC2sUwKaP5k8JFhs1iY/nIijYKmB2xXYRgOXnZlugEwEwfAVgQOEQIt86A+/LxtWxt5VlT+BdSsk2yv6tsb+99F3k/UgAex+UFxfjONnYEyDuBvA/gYIK9xcBxBzgkERoBkGxBTOv9MuiIIHIkR3+biDBF/znA/9li//ZgFzAZz4xxYIY3wbULcG6Dikdg60FSGBpRCqg4YQiptGyyWj0xO0TMQTA/65j8xf/AAX1jfiOC1B6ITQdoOiC6DWxJsN9PCi2pfs0alwogcYIJCyBxswI1WIEJLHDinYf1FMdOI7E+xRyeecclAFTpKBFxRgpQASCSzjYagOInCuuICHuCdKUAdIPqKhELDYRSw1oBL1AJIitweIhMTIMKYFCWuqEA0oATJFes9YfIF8ZADfHZkhhDMSUdKJ/CyiFRyo1UaJinYx4X+s7ORJ5QXbKMfKqjNPOo0CokIIOIVTxiANhGntE+7CKAQ/0SpK8gU9bQupqPKHpVMqZHKuvaLS7V1Mu1GJisomawP4qAP+JesaVW5gET8gUYBuM3MQSS3asdBSQs2Yz5wmqTCcUAwAl6F1BM6KNDk9TkiEguuRAWQr8Ay7/cqwd2b/KJPICagMWS9KvluFsmP5asFlOhopNQLKg3JEkkyuYXGZnAzRCvYJp8zg5UV364MBpAZPpS68PMYoD3k6PLJH9QCVvd0WFk9EtsoarZZ3glnv7kVWJWANXsOUepftxsVDegFxOAY8SiYzDXvKwwu5wMkO3DZBh+VQaRlF6tIrBkI3wbgUPJp+dKtMWNzkNf6/9KqalUgJxdap4DKqA1MRYwNhKHDYPq1N4YYNaOvUkRv1LUkUBjcm0aRv0kaxp4UxxjcnMRN8pqMNGQVKibo1wnuUCJ87M6W1hInmMhQ+CN6QexolHs4+TCcAee2x5BMmJgk6yYeHEyJxlEkmIoXjWRD3Q5IJobqukijauIhu0ZAThhQTI/hXOBQHBs5OQJ/A9mQXcydjP/D0wiIeQTEPC1tKt0vMFpFCFkSyhFTBy6vPXoVHnEOVi4ykJxFGTjqgVtpUBJSf2BwyNRQSKhYymZTs5TSECl+LGfhWmTfQGYtYuZPXBkix12mY+ckvxPQDFwyM+ZfqEyllj5jqCLpYyjjHTTmVBpwDU2ZqVKoURLZyII2QNTnSNVNMBqcgO0mMRgAgMAFIkcSFrFpllu1xCnoTILJ0MrZ9SN2aEk9ludP8Ps/+snF/wWl6uzxcoXVxxnB0kO0nEQsly0SpdKwQGKoLIWawBSeAUwDOQijjaQAXQbkDLoemZiZE4RP4FOWHETAH5MUaMiAkvUqwBSrMbUA6jF0lnmEZp0QdWXqOzK2YYIsIbLlVVdot05gJQUGZUAmARlEhfIRIIgHyBXg5JtAOcFjO7l1FQ2hPaaCtBTRt465/mcIEiFcYT1Y489M5n0EsYACQhec+EI5WC7Eyi+nkMmf7GWk4wiROkTIgMTdB/1u+QKfQnkFQAkUApwU2zKFKBmQhzO/dQerSBikHzUGR8+UIgX16MCHRJvZKeb2P5t8ay59Rtnb2bbX1vRuUjtvlNd7P1GZJUj+v53oo/08O7ZESe02KRSAxAJIYAKwzXLsNw+rDGGG1AAS4g/a5AQ8nlxNZwM8UnYSiD8TAi5cYYeTY8nA1sZSB1pf1f8jg1Uq8zNKA0+STHP4YbTtFwjXRYQzQZ4UooKDXwGVDAj5A0QiQhgJI3IYMVpyvMVhVzXYWcKsUPChaUQD4VLSBFfioRSIs7BiKSAEi38dItCVn55FiioVEeSkCqLElXU16s8K0WpkdF6lPmUqGsVSE7FJABxZWicV9Z+20EQdp4WHbB4x2YeCPDhP0Z4TveD033s9IulkShQ+AVPPgE+kx9vpoA+iZXkgGXtmJ/McPmouTqK9KK+UDYJrMsm75Dp04fOdpGMqtzY0PM7Jffk44gE0KEs/CoKRll2UCZkC9ZaI3kaERmeIGGAugUTm0F2yJ8yaanNTLXdJOnDaTlXWzniF/0XAYWZ6HYR99VIjch5fcWMqoVUC+zeGGaWeiFBaQxSCWZ3JvzTIC5wSKMm3lQBnzhw30cob8Q0IlBMA8gCrgOWqqwQzgwi9mtIA7BmThREkaLmDhYo+sdyaALoMPhXL1RVObM3IDt3iHHLCyPy5qNpGNm5BVlNwACqoNsX2KEYxSrKNCqKpRsPSJQYYajX2Lw0GQsbI7ioVw5SciY0C+XnPzCm5By5h8runugQWZtr5DSJOfFIAiJSq2KUl0XWyIUuisp5CnKU7yoWdlu2zWPtr7gqUB49UQeUdqHkgAgVzQRgMpX7gEyVK/VI7EPGki4DBqugz/e6aTiIltZ/eK7CfD0uAF9K6JfjQZYxOGV14IunhTUWxTqYbFOKxBFfHG3XyUEbcaHYMDDX8CWCjWzPRZfCBCQrLlIbcqsH1M2Wkyrq4sreXssVgHKoChFLFVSxNAEqee00DvDDDBXBytwvakKP2ucqbKJ1fqW0niuUQQqBys6qFn6X/wqzkILFUtUWg4qEEuKVatfBQS3wPVR5GK7WXZF1mVhOV0QiyQgD6BOVDqK6kEr8rqBYjp1C3EVY7Trj1cvo6YZovJiJEGq0FxPSNptH7JQDaA4QQ8GYSEnTgSQhGfEFSjrhqgoQCMOFLHD05Xd+CI8FtCbx8y2rz+rowgWfw9GkKvRLq2/n6If7u8WQ07RpadNMYvS1GGaqiUAKdHHtfGCfPNRewbyAzhlohHOWU2nC1ime+0flVuAnyFM0wGiFtUhUFVdq1ly6r/DPNMrmyDFmykdZNVtljqjl36k5fzKJVTBLlP6mYjx38AeZUIyNDGvRvkzjZ6Vg+RlcysoD1Q/5Aq/gGUHKEX9lQZ6ufBWuXykEb1fFKgiSpJBtgKVQaB6HpJ2ixTvOZU/UXMseR49P1AsySb5yzJ5RtV5ohzEVMdAoKGB4beUME0tWeYkpB/Z0TRvtXuaL6jG7KdFldV38aFkAcPEBELjUVSpD6r+m4vEDtk5pHAW7nMiXIsq1yXm3ciyuKRIYNAe0kGGpqrAqbbc3G7ym0ozxZ4NAN0yAIABQCAOcN2Umho0CaSnWtg0yWmKNl5i3ZVYqMWaKbt2ELJaIzgajrNlPk7qYIxMW6bxG9+FkKlhc3AR6FI5cqUrXcXtTfA4W8tZesrXRbeKta+qVAz8VNS7awQapJJXEojhdw+IHmtjrEpnl4s4ERKNYrgbxQA0Gi9Ja9vKDvacldhUzkUwzCbbWx2286aRL23KQDtkHE7S2piA2bsUNHF7ZtLMViN9FA8mnddrF33aJdX2xjjYhF207ZdH2gKR4JB39amZQ2uilDrG0w6IYqEHAmWovVEEotPFcQLeucCo7Gp7DDDIvVEpSU4GuwfuD03J0EZidcDaWITph2U78CAaVbamHzgbas8J0n3ko1aVc65w/GvnaJN0mAJC67w4GRdSO6gFDNA862Qrr03nbsIHGu6Yglfik6t4O8H6f/FYDsAuAwCUTRANGCSRoEagWBM/AQQGBC9R8dQOsR3JRq6AmxUUSUFXgt714kATlEISEJ7ASA8uBgAAHYmhaAJoQwHlGcpFR7KWgOygECT6agM+2gPLj2BNCagAgJoXsHZR77aABweBIYEL0HB2UTQpoQcAP2T7MgwwifQwFfDsoGA8uAQHsHlxoB5R8otffKKOBND2UCwTIkcCv3soz9A+iAJACODz6DguwSfQwAOCcokQ8o+XDUFAMAGADk+nfQcEn1X7RAaARUQwE5ST60AaARfafpfiD6SAy+xUXvoEDy5FR2+7A2gHX17BuUDARUUcFoBHB0DxcI4PKPX0kBFRuB2gKoFahUGoDiBmoFynH2HBJ9iowA2QZ4NNCSAqgeUXsEP0HBaAu+moJyn/0HA6AP+tAxAcL3qGjgaAdlLwdoMAHVw2+poQYYOBwHFRpBgQDUC30MAj9AgR6OvtMOD6bI6ByfQAcENMGv9DhifZPvlwIGaDnKb/ScDoM1B5c/+iffKPZS+HJDUAPYC/rX3b7DD8o0QJPvlHP6BALh9lHsFUCKiaDhR6I6uFYM1BKj2QDIyOGQMHBf9IYFfeyiSOMHSDnKTo2vv0NZG99BwPQ0gcX0CGBAtAFUU0Z0MwG59pB9w7waOBwGmhJ+zw94aaGcpVw5R1Q9ob2CsG4DuwRUX4agO0BFRLhjY4UcEPyj0D7KI8HsFOO4GmhRweXDZBIDaH0Dq4blCUaeNTHm9hej4zAaYNHAP9ah2feyjAMb6SA+hzlJyhn01AijYoG/e/un2pHjjUAPfXoc/2KjNDwh9lFwdOMkA9gChw4GgEMNyQRDwJkgD/qhNqHtDaJyANgc5RVG9gio3YB/vuNAHGDJASI8PqhOT66AewAA+CaaFcG8j7ZJo5Pu0MCAkD6Bg4PLkAMTHUjJBwQ8gapMzGzjwwiw3CZqBOH5R9J2E4gY0NMmGAIp8o9oZ4MCGv9whuA6BEIMhgGAFhok0iHANNGZ9BwJfUcGQPIHyj7Bq/fPsci0A9gRR8MDfpUAkAb9qhvfUkd5SunWDSx2fXKflGz6SDQBvQwvpOD3G9gSIBIxocAO76jghJxo38f8OL7d9Gx//TUHn2KikzMJlk0EdJMIHFRaAT/VygP2EGnj0pvU00dJP5GQwJAKE5ymBONmNjgB8o+/s5SDGuD1Z3o5YaTNPHJ99JxyE2aAMH7UDsJ+XKIAEATmbIaR3fSfrjOv6HTThuU4wcoPFmoDSR+Ue6cpNEGuUABiIyQE9P6H992Bk04GaP1HB79pJhgOUYgOQGoAbetCJMQmriYW44ETePoCAA=== -->

<!-- internal state end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=y-scope/clp&utm_content=1033):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

@SharafMohamed SharafMohamed marked this pull request as ready for review July 16, 2025 07:58
@SharafMohamed SharafMohamed requested a review from a team as a code owner July 16, 2025 07:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

365-404: Cast token type once to the enum to avoid repeated static_cast

You now compare token_type (an int) against many SymbolId values. Casting once avoids repetition and guards against accidental value mismatches:

-        int token_type = token.m_type_ids_ptr->at(0);
+        auto token_type = static_cast<log_surgeon::SymbolId>(token.m_type_ids_ptr->at(0));-        && token_type != static_cast<int>(log_surgeon::SymbolId::TokenUncaughtString)
-        && token_type != static_cast<int>(log_surgeon::SymbolId::TokenNewline))
+        && token_type != log_surgeon::SymbolId::TokenUncaughtString
+        && token_type != log_surgeon::SymbolId::TokenNewline)-            case static_cast<int>(log_surgeon::SymbolId::TokenNewline):
-            case static_cast<int>(log_surgeon::SymbolId::TokenUncaughtString):
+            case log_surgeon::SymbolId::TokenNewline:
+            case log_surgeon::SymbolId::TokenUncaughtString:-            case static_cast<int>(log_surgeon::SymbolId::TokenInt): {
+            case log_surgeon::SymbolId::TokenInt: {-            case static_cast<int>(log_surgeon::SymbolId::TokenFloat): {
+            case log_surgeon::SymbolId::TokenFloat: {

This keeps the switch strongly-typed and prevents silent breakage if enum underlying values ever shift.

♻️ Duplicate comments (1)
components/core/src/clp/clp/FileCompressor.cpp (1)

189-190: Mirror comment above – logging missing here as well

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46aa1e1 and 6dcadca.

📒 Files selected for processing (11)
  • components/core/config/schemas.txt (1 hunks)
  • components/core/src/clp/Grep.cpp (1 hunks)
  • components/core/src/clp/Profiler.hpp (1 hunks)
  • components/core/src/clp/Utils.cpp (4 hunks)
  • components/core/src/clp/clp/FileCompressor.cpp (2 hunks)
  • components/core/src/clp/clp/run.cpp (1 hunks)
  • components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (3 hunks)
  • components/core/tests/test-ParserWithUserSchema.cpp (5 hunks)
  • components/core/tests/test_schema_files/search_schema.txt (1 hunks)
  • taskfiles/deps/main.yaml (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🧠 Learnings (10)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
components/core/src/clp/clp/run.cpp (5)
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (4)
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.
components/core/src/clp/clp/FileCompressor.cpp (7)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.hpp:65-75
Timestamp: 2024-12-19T01:40:13.272Z
Learning: "try_get_pos" method in `Compressor.hpp` does not throw exceptions, as clarified by Bill-hbrhbr.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
components/core/src/clp/streaming_archive/writer/Archive.cpp (11)
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Learnt from: haiqi96
PR: y-scope/clp#646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where `m_use_single_file_archive` is initialized to `false`.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: The variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type change, even if the actual variable type has been altered.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
components/core/src/clp/Utils.cpp (19)
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
components/core/src/clp/Grep.cpp (4)
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
components/core/tests/test-ParserWithUserSchema.cpp (21)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Learnt from: LinZhihao-723
PR: y-scope/clp#593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.hpp:62-68
Timestamp: 2024-10-14T03:38:57.921Z
Learning: In `KeyValuePairLogEvent.hpp`, the methods `get_auto_generated_schema_tree()` and `get_user_generated_schema_tree()` do not need to check for null shared pointers because the factory function ensures they are not null.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:629-733
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the `JsonParser::parse_kv_log_event` method in `components/core/src/clp_s/JsonParser.cpp`, prefer passing exceptions to higher levels for more graceful handling, rather than handling them at that level.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
taskfiles/deps/main.yaml (1)
Learnt from: anlowee
PR: y-scope/clp#925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
components/core/src/clp/Profiler.hpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
🪛 LanguageTool
components/core/config/schemas.txt

[typographical] ~6-~6: Unpaired symbol: ']' seems to be missing
Context: ...-\d{2}T\d{2}:\d{2}:\d{2},\d{3} // E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-...

(UNPAIRED_BRACKETS)


[typographical] ~7-~7: Unpaired symbol: ']' seems to be missing
Context: ...// E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2} //...

(UNPAIRED_BRACKETS)


[typographical] ~35-~35: Unpaired symbol: ']' seems to be missing
Context: ...-z]+ \d{2}, \d{4} \d{2}:\d{2} // E.g. E [31/Jan/2015:15:50:45 // E.g. localhost -...

(UNPAIRED_BRACKETS)


[typographical] ~36-~36: Unpaired symbol: ']' seems to be missing
Context: ...Jan/2015:15:50:45 // E.g. localhost - - [01/Jan/2016:15:50:17 // E.g. 192.168.4.5...

(UNPAIRED_BRACKETS)


[typographical] ~37-~37: Unpaired symbol: ']' seems to be missing
Context: ...n/2016:15:50:17 // E.g. 192.168.4.5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[...

(UNPAIRED_BRACKETS)


[typographical] ~38-~38: Unpaired symbol: ']' seems to be missing
Context: ....5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d...

(UNPAIRED_BRACKETS)


[typographical] ~39-~39: Unpaired symbol: ']' seems to be missing
Context: ...{2}:\d{2}:\d{2} // E.g. 192.168.4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d...

(UNPAIRED_BRACKETS)


[typographical] ~40-~40: Unpaired symbol: ']' seems to be missing
Context: ...4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d{2}/\d{4}:\d{2}:\d{2}:\d{2} // E...

(UNPAIRED_BRACKETS)


[uncategorized] ~43-~43: Pentru limba română se folosește simbolul "«".
Context: ...0-9]{2} \d{2}:\d{2}:\d{2} \d{4} // E.g. <<<2016-11-10 03:02:29:936 timestamp:<<...

(GHILIMELE_DUBLE_INTERIOR_INCEPUT)


[duplication] ~52-~52: Posibilă greșeală: ați repetat un cuvânt
Context: ...d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{6} // Delimiters delimiters: \t\r\n!"#$%&'()*,:;<>?@[]^_`...

(WORD_REPEAT_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (23)
components/core/tests/test_schema_files/search_schema.txt (1)

2-2: Confirm that the escaped question-mark behaves as intended

The added backslash makes ? a literal delimiter instead of a regex quantifier. Double-check that the parser indeed expects a raw character in this position; otherwise the backslash may be taken literally and break delimiter detection.

components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1)

20-28: Validate new source entries

Profiler.hpp/cpp and Stopwatch.hpp/cpp are already part of several other targets. Adding them here is fine but please ensure:

  1. They are not compiled twice into different objects with conflicting ODR definitions.
  2. Header-only files (*.hpp) need not be listed in MAKE_DICTIONARIES_READABLE_SOURCES unless you rely on IDE source-grouping.

No action required if both points are already covered.

components/core/src/clp/clp/FileCompressor.cpp (1)

122-123: Start/stop calls look correct but remember to emit the log

You changed to fragmented measurement but removed the corresponding LOG_FRAGMENTED_MEASUREMENT(ParseLogFile) call. Without it the timing data will be gathered yet never printed. If that’s intentional, ignore; otherwise add:

LOG_FRAGMENTED_MEASUREMENT(Profiler::FragmentedMeasurementIndex::ParseLogFile);

just after the stop_fragmented_measurement call.

components/core/src/clp/Utils.cpp (4)

137-166: LGTM: Enum renaming is consistent and correct.

The systematic renaming from log_surgeon::SymbolID to log_surgeon::SymbolId throughout the symbol mapping initialization is well-executed and maintains consistency with the log-surgeon library update.


182-182: LGTM: API method name change is properly handled.

The change from add_delimiters to set_delimiters aligns with the updated log-surgeon API and maintains the same functionality.


206-206: LGTM: Array type improvement.

The change from a fixed-size boolean array to std::array<bool, log_surgeon::cSizeOfUnicode> is a good improvement that provides better type safety and clearer intent.


246-247: LGTM: Appropriate deprecation handling.

The commented-out lexer.generate_reverse() call with a clear TODO comment explaining it's no longer used is a reasonable approach for deprecated functionality.

components/core/src/clp/Grep.cpp (2)

869-874: LGTM: Proper adaptation to new scan interface.

The structured bindings usage to unpack the error code and optional token from the new scan method is correct. The error handling checks for success before accessing the token value, which is appropriate defensive programming.


886-887: LGTM: Consistent enum name updates.

The enum references have been correctly updated from log_surgeon::SymbolId::TokenUncaughtStringID to TokenUncaughtString and from TokenEndID to TokenEnd, maintaining consistency with the log-surgeon library changes.

components/core/config/schemas.txt (3)

1-51: LGTM: Comprehensive timestamp pattern expansion.

The significantly expanded timestamp regex patterns provide much better coverage of common timestamp formats, including ISO 8601 variants, bracketed timestamps, and various separator formats. This enhancement will improve log parsing capabilities across diverse log formats.


53-53: LGTM: Delimiter configuration enhancement.

The addition of the '=' character to the delimiters list is appropriate and supports the dictionary variable patterns that use equals signs as delimiters.


61-61: LGTM: Named capture group addition.

The addition of the named capture group (?<val>...) in the equals pattern enhances the parsing functionality by enabling capture-groups, which aligns with the PR objectives.

components/core/tests/test-ParserWithUserSchema.cpp (4)

12-12: LGTM: Appropriate header inclusion.

The addition of #include <log_surgeon/SchemaParser.hpp> is necessary for the static method call and follows good practice.


31-31: LGTM: Correct API update.

The change from an instance method to the static method call log_surgeon::SchemaParser::try_schema_file properly adapts to the updated log-surgeon API.


168-180: LGTM: Proper test adaptation to new scan interface.

The structured bindings usage in the test cases correctly adapts to the new scan method that returns a tuple of error code and optional token. The pattern is consistent and properly handles the API change.


171-171: LGTM: Consistent enum update.

The enum reference update from SymbolID::TokenEndID to SymbolId::TokenEnd is consistent with the library-wide enum renaming.

taskfiles/deps/main.yaml (4)

22-22: LGTM: Appropriate new dependency variable.

The addition of G_GSL_LIB_NAME: "Microsoft.GSL" follows the existing pattern for library name variables and supports the new dependency.


70-70: LGTM: Dependency integration.

Adding microsoft.gsl to the core-all-parallel dependencies list properly integrates the new dependency into the build system.


225-239: LGTM: Log-surgeon dependency updates.

The updates to the log-surgeon task properly configure dependencies on both fmt and microsoft.gsl, include the necessary CMake configuration files, and update to the new tarball with appropriate checksum verification.


241-253: LGTM: Well-configured Microsoft GSL dependency.

The new microsoft.gsl task follows the established patterns for dependency management, includes appropriate build configuration (disabling tests), and uses proper checksum verification for security.

components/core/src/clp/Profiler.hpp (3)

48-56: LGTM: Well-structured addition of new fragmented measurement indices.

The new enumerators follow consistent naming conventions and provide granular profiling capabilities for different parsing operations. The additions align with the transition from continuous to fragmented measurements mentioned in the PR objectives.


64-65: LGTM: Consistent disabling of continuous measurements.

The disabling of both Compression and Search continuous measurements is consistent with the overall strategy to transition profiling to fragmented measurements, as indicated by the AI summary.


70-78: LGTM: Selective enabling strategy for fragmented measurements.

The configuration appropriately enables only Watch and TokenCreation measurements while keeping others disabled. This selective approach suggests a performance-conscious profiling strategy that focuses on key operations without overwhelming the system with excessive measurement overhead.

davidlion and others added 6 commits July 21, 2025 00:55
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/core/config/schemas.txt (2)

84-84: Delimiter list readability issue already raised

The inconsistent escaping in the delimiter string (mix of raw and escaped characters) remains. See previous discussion; consider adopting a single, consistent style.


97-97: hex pattern still excludes digits 0-9

Standard hex values like deadbeef and 0F0F should be accepted. Prior feedback hasn’t been applied.

-hex:[a-fA-F]+
+hex:[0-9a-fA-F]+
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbf6195 and 716e871.

📒 Files selected for processing (1)
  • components/core/config/schemas.txt (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.
components/core/config/schemas.txt (4)

Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like (?:\d{1,2}| \d) are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like [ 0-9]{2} that allow character classes with spaces are valid in log-surgeon schemas.

Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the . character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .* in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

🪛 LanguageTool
components/core/config/schemas.txt

[typographical] ~25-~25: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...-/]\d{2}[T ]\d{2}:\d{2}:\d{2} // E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-...

(EL_UNPAIRED_BRACKETS)


[typographical] ~26-~26: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...// E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2} /...

(EL_UNPAIRED_BRACKETS)


[typographical] ~57-~57: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...z]+ \d{2}, \d{4} \d{2}:\d{2} // E.g. E [31/Jan/2015:15:50:45 // E.g. localhost -...

(EL_UNPAIRED_BRACKETS)


[typographical] ~58-~58: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...Jan/2015:15:50:45 // E.g. localhost - - [01/Jan/2016:15:50:17 // E.g. 192.168.4.5...

(EL_UNPAIRED_BRACKETS)


[typographical] ~59-~59: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...n/2016:15:50:17 // E.g. 192.168.4.5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[...

(EL_UNPAIRED_BRACKETS)


[typographical] ~60-~60: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ....5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d...

(EL_UNPAIRED_BRACKETS)


[typographical] ~62-~62: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...2}:\d{2}:\d{2} // E.g. 192.168.4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d...

(EL_UNPAIRED_BRACKETS)


[typographical] ~63-~63: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d{2}/\d{4}:\d{2}:\d{2}:\d{2} // ...

(EL_UNPAIRED_BRACKETS)


[typographical] ~84-84: Αταίριαστο σύμβολο: το '"' φαίνεται πως λείπει
Context: .../ Delimiters // === delimiters: \t\r\n!"#$%&'()*,:;<=>?@[]^_`{|}
/...

(EL_UNPAIRED_BRACKETS)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
components/core/config/schemas.txt (3)

68-68: Digits still excluded from hex pattern
Issue already raised previously; pattern remains [a-fA-F]+, rejecting valid hex like 7E57.


34-38: Add a note explaining the intentionally-missing closing ]

The pattern stops before \] because ] is a delimiter. An inline comment would save future readers from thinking the bracket was forgotten.


55-55: Mixed escaping in delimiter list hurts readability
Earlier nitpick about consistently escaping or un-escaping special chars is still unresolved.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 716e871 and 70db06f.

📒 Files selected for processing (1)
  • components/core/config/schemas.txt (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.
components/core/config/schemas.txt (5)

Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like (?:\d{1,2}| \d) are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like [ 0-9]{2} that allow character classes with spaces are valid in log-surgeon schemas.

Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the . character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .* in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., REQUIRE((expression))) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

🪛 LanguageTool
components/core/config/schemas.txt

[uncategorized] ~58-~58: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...*,:;<=>?@[]^_\{\|\}~ // === // Specially-encoded variables (using the intandfloat` ...

(HYPHENATED_LY_ADVERB_ADJECTIVE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
components/core/src/clp/clo/clo.cpp (1)

22-29: Incorrect namespace qualification for FileDecompressor

using clp::clp::FileDecompressor; is illegal – there is no nested clp::clp namespace.
This will fail to compile and should be simplified to a single-level qualification.

-using clp::clp::FileDecompressor;
+using clp::FileDecompressor;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64c0216 and 4a05836.

📒 Files selected for processing (1)
  • components/core/src/clp/clo/clo.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/clo/clo.cpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.
components/core/src/clp/clo/clo.cpp (13)

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the m_case_sensitive_search flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: LinZhihao-723
PR: #856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Learnt from: gibber9809
PR: #630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In components/core/src/clp_s/JsonParser.cpp, validation and exception throwing are unnecessary in the get_archive_node_id method when processing nodes, and should not be added.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In components/core/src/clp_s/JsonParser.cpp, when handling errors in parse_from_ir, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In components/core/src/clp_s/SchemaTree.hpp, the SchemaNode class uses std::unique_ptr<char[]> m_key_buf and std::string_view m_key_name to ensure that references to m_key_name remain valid even after SchemaNode is move-constructed.

Learnt from: LinZhihao-723
PR: #544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:170-186
Timestamp: 2024-09-24T22:33:59.488Z
Learning: Only trivially destructible global objects are allowed in the coding standard, so avoid suggesting the use of non-trivially destructible global objects like static std::unordered_map.

Learnt from: haiqi96
PR: #646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where m_use_single_file_archive is initialized to false.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
🔇 Additional comments (1)
components/core/src/clp/clo/clo.cpp (1)

493-501: Guard against unusable lexer when schema is missing

Grep::process_raw_query is always given a lexer reference, yet when no schema exists the lexer has no delimiters or token table.
If process_raw_query relies on a populated lexer this may lead to undefined behaviour or silent mis-parses.

Either:

  1. Confirm that an “empty” lexer is acceptable and document the assumption, or
  2. Skip lexer loading entirely and signal the absence with std::nullopt / overloads.

If option 2 is chosen, the earlier refactor to remove the heap allocation enables:

log_surgeon::lexers::ByteLexer* lexer_ptr = nullptr;
if (std::filesystem::exists(schema_file_path)) {
    lexer_ptr = &lexer;
    load_lexer_from_file(schema_file_path.string(), *lexer_ptr);
}
...
Grep::process_raw_query(..., lexer_ptr ? *lexer_ptr : default_lexer, use_heuristic);

Validate the expectations inside process_raw_query before merging.

@davidlion davidlion requested a review from kirkrodrigues July 22, 2025 01:01
@davidlion davidlion self-requested a review July 25, 2025 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants