-
Notifications
You must be signed in to change notification settings - Fork 81
build: Update to the latest version of log-surgeon. #1033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Walkthrough
The changes update schema configuration regexes, refactor lexer loading to remove direction flags, and update enum names for token types. Dependency management now includes Microsoft GSL. Tests and application code are updated to match the new lexer interface and enums, with no changes to public APIs except for the `load_lexer_from_file` signature.
## Changes
| File(s) | Change Summary |
|-------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| components/core/config/schemas.txt | Expanded timestamp regex patterns, reordered and refined dictionary variable regexes, updated delimiters line. |
| components/core/src/clp/Grep.cpp | Updated token scanning logic to new lexer interface, corrected enum names, improved error checking, consolidated lexer parameters. |
| components/core/src/clp/Utils.cpp, components/core/src/clp/Utils.hpp | Removed direction flag from `load_lexer_from_file`, updated data structures, enum names, and documentation. |
| components/core/src/clp/streaming_archive/writer/Archive.cpp | Updated enum type and member names for token types, added runtime error throw on timestamp pattern failure. |
| components/core/src/clp/clg/clg.cpp, components/core/src/clp/clo/clo.cpp | Removed boolean direction argument from `load_lexer_from_file` calls, consolidated lexer usage to single instance. |
| components/core/tests/test-ParserWithUserSchema.cpp, components/core/tests/test-Grep.cpp | Updated lexer loading and token scanning to new interfaces, corrected symbol constants, updated error message in one test. |
| components/core/tests/test_schema_files/search_schema.txt | Escaped question mark in delimiters line. |
| taskfiles/deps/main.yaml | Added Microsoft GSL dependency and installation task, updated log-surgeon dependency, and extended CMake arguments for dependencies. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant MainApp
participant Utils
participant Lexer
MainApp->>Utils: load_lexer_from_file(schema_file_path, lexer)
Utils->>Lexer: set_delimiters(...)
Utils->>Lexer: generate()
Utils-->>MainApp: (lexer ready) sequenceDiagram
participant Test
participant Utils
participant Lexer
Test->>Utils: load_lexer_from_file(schema_file_path, lexer)
Utils->>Lexer: set_delimiters(...)
Utils->>Lexer: generate()
Utils-->>Test: (lexer ready)
loop Token Scanning
Test->>Lexer: scan(input)
Lexer-->>Test: (error_code, optional_token)
end
Estimated code review effort3 (120 minutes) Suggested reviewers
Learnt from: gibber9809
Learnt from: SharafMohamed
|
…into new-log-surgeon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)
365-404
: Cast token type once to the enum to avoid repeatedstatic_cast
You now compare
token_type
(anint
) against manySymbolId
values. Casting once avoids repetition and guards against accidental value mismatches:- int token_type = token.m_type_ids_ptr->at(0); + auto token_type = static_cast<log_surgeon::SymbolId>(token.m_type_ids_ptr->at(0)); … - && token_type != static_cast<int>(log_surgeon::SymbolId::TokenUncaughtString) - && token_type != static_cast<int>(log_surgeon::SymbolId::TokenNewline)) + && token_type != log_surgeon::SymbolId::TokenUncaughtString + && token_type != log_surgeon::SymbolId::TokenNewline) … - case static_cast<int>(log_surgeon::SymbolId::TokenNewline): - case static_cast<int>(log_surgeon::SymbolId::TokenUncaughtString): + case log_surgeon::SymbolId::TokenNewline: + case log_surgeon::SymbolId::TokenUncaughtString: … - case static_cast<int>(log_surgeon::SymbolId::TokenInt): { + case log_surgeon::SymbolId::TokenInt: { … - case static_cast<int>(log_surgeon::SymbolId::TokenFloat): { + case log_surgeon::SymbolId::TokenFloat: {This keeps the switch strongly-typed and prevents silent breakage if enum underlying values ever shift.
♻️ Duplicate comments (1)
components/core/src/clp/clp/FileCompressor.cpp (1)
189-190
: Mirror comment above – logging missing here as well
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (11)
components/core/config/schemas.txt
(1 hunks)components/core/src/clp/Grep.cpp
(1 hunks)components/core/src/clp/Profiler.hpp
(1 hunks)components/core/src/clp/Utils.cpp
(4 hunks)components/core/src/clp/clp/FileCompressor.cpp
(2 hunks)components/core/src/clp/clp/run.cpp
(1 hunks)components/core/src/clp/make_dictionaries_readable/CMakeLists.txt
(1 hunks)components/core/src/clp/streaming_archive/writer/Archive.cpp
(3 hunks)components/core/tests/test-ParserWithUserSchema.cpp
(5 hunks)components/core/tests/test_schema_files/search_schema.txt
(1 hunks)taskfiles/deps/main.yaml
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
Instructions used from:
Sources:
⚙️ CodeRabbit Configuration File
🧠 Learnings (10)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
components/core/src/clp/clp/run.cpp (5)
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (4)
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.
components/core/src/clp/clp/FileCompressor.cpp (7)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.hpp:65-75
Timestamp: 2024-12-19T01:40:13.272Z
Learning: "try_get_pos" method in `Compressor.hpp` does not throw exceptions, as clarified by Bill-hbrhbr.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
components/core/src/clp/streaming_archive/writer/Archive.cpp (11)
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Learnt from: haiqi96
PR: y-scope/clp#646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where `m_use_single_file_archive` is initialized to `false`.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: The variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type change, even if the actual variable type has been altered.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
components/core/src/clp/Utils.cpp (19)
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
components/core/src/clp/Grep.cpp (4)
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
components/core/tests/test-ParserWithUserSchema.cpp (21)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Learnt from: LinZhihao-723
PR: y-scope/clp#593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.hpp:62-68
Timestamp: 2024-10-14T03:38:57.921Z
Learning: In `KeyValuePairLogEvent.hpp`, the methods `get_auto_generated_schema_tree()` and `get_user_generated_schema_tree()` do not need to check for null shared pointers because the factory function ensures they are not null.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:629-733
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the `JsonParser::parse_kv_log_event` method in `components/core/src/clp_s/JsonParser.cpp`, prefer passing exceptions to higher levels for more graceful handling, rather than handling them at that level.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
taskfiles/deps/main.yaml (1)
Learnt from: anlowee
PR: y-scope/clp#925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
components/core/src/clp/Profiler.hpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
🪛 LanguageTool
components/core/config/schemas.txt
[typographical] ~6-~6: Unpaired symbol: ']' seems to be missing
Context: ...-\d{2}T\d{2}:\d{2}:\d{2},\d{3} // E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-...
(UNPAIRED_BRACKETS)
[typographical] ~7-~7: Unpaired symbol: ']' seems to be missing
Context: ...// E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2} //...
(UNPAIRED_BRACKETS)
[typographical] ~35-~35: Unpaired symbol: ']' seems to be missing
Context: ...-z]+ \d{2}, \d{4} \d{2}:\d{2} // E.g. E [31/Jan/2015:15:50:45 // E.g. localhost -...
(UNPAIRED_BRACKETS)
[typographical] ~36-~36: Unpaired symbol: ']' seems to be missing
Context: ...Jan/2015:15:50:45 // E.g. localhost - - [01/Jan/2016:15:50:17 // E.g. 192.168.4.5...
(UNPAIRED_BRACKETS)
[typographical] ~37-~37: Unpaired symbol: ']' seems to be missing
Context: ...n/2016:15:50:17 // E.g. 192.168.4.5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[...
(UNPAIRED_BRACKETS)
[typographical] ~38-~38: Unpaired symbol: ']' seems to be missing
Context: ....5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d...
(UNPAIRED_BRACKETS)
[typographical] ~39-~39: Unpaired symbol: ']' seems to be missing
Context: ...{2}:\d{2}:\d{2} // E.g. 192.168.4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d...
(UNPAIRED_BRACKETS)
[typographical] ~40-~40: Unpaired symbol: ']' seems to be missing
Context: ...4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d{2}/\d{4}:\d{2}:\d{2}:\d{2} // E...
(UNPAIRED_BRACKETS)
[uncategorized] ~43-~43: Pentru limba română se folosește simbolul "«".
Context: ...0-9]{2} \d{2}:\d{2}:\d{2} \d{4} // E.g. <<<2016-11-10 03:02:29:936 timestamp:<<...
(GHILIMELE_DUBLE_INTERIOR_INCEPUT)
[duplication] ~52-~52: Posibilă greșeală: ați repetat un cuvânt
Context: ...d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{6} // Delimiters delimiters: \t\r\n!"#$%&'()*,:;<>?@[]^_`...
(WORD_REPEAT_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (23)
components/core/tests/test_schema_files/search_schema.txt (1)
2-2
: Confirm that the escaped question-mark behaves as intendedThe added backslash makes
?
a literal delimiter instead of a regex quantifier. Double-check that the parser indeed expects a raw character in this position; otherwise the backslash may be taken literally and break delimiter detection.components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1)
20-28
: Validate new source entries
Profiler.hpp/cpp
andStopwatch.hpp/cpp
are already part of several other targets. Adding them here is fine but please ensure:
- They are not compiled twice into different objects with conflicting ODR definitions.
- Header-only files (
*.hpp
) need not be listed inMAKE_DICTIONARIES_READABLE_SOURCES
unless you rely on IDE source-grouping.No action required if both points are already covered.
components/core/src/clp/clp/FileCompressor.cpp (1)
122-123
: Start/stop calls look correct but remember to emit the logYou changed to fragmented measurement but removed the corresponding
LOG_FRAGMENTED_MEASUREMENT(ParseLogFile)
call. Without it the timing data will be gathered yet never printed. If that’s intentional, ignore; otherwise add:LOG_FRAGMENTED_MEASUREMENT(Profiler::FragmentedMeasurementIndex::ParseLogFile);
just after the
stop_fragmented_measurement
call.components/core/src/clp/Utils.cpp (4)
137-166
: LGTM: Enum renaming is consistent and correct.The systematic renaming from
log_surgeon::SymbolID
tolog_surgeon::SymbolId
throughout the symbol mapping initialization is well-executed and maintains consistency with the log-surgeon library update.
182-182
: LGTM: API method name change is properly handled.The change from
add_delimiters
toset_delimiters
aligns with the updated log-surgeon API and maintains the same functionality.
206-206
: LGTM: Array type improvement.The change from a fixed-size boolean array to
std::array<bool, log_surgeon::cSizeOfUnicode>
is a good improvement that provides better type safety and clearer intent.
246-247
: LGTM: Appropriate deprecation handling.The commented-out
lexer.generate_reverse()
call with a clear TODO comment explaining it's no longer used is a reasonable approach for deprecated functionality.components/core/src/clp/Grep.cpp (2)
869-874
: LGTM: Proper adaptation to new scan interface.The structured bindings usage to unpack the error code and optional token from the new
scan
method is correct. The error handling checks for success before accessing the token value, which is appropriate defensive programming.
886-887
: LGTM: Consistent enum name updates.The enum references have been correctly updated from
log_surgeon::SymbolId::TokenUncaughtStringID
toTokenUncaughtString
and fromTokenEndID
toTokenEnd
, maintaining consistency with the log-surgeon library changes.components/core/config/schemas.txt (3)
1-51
: LGTM: Comprehensive timestamp pattern expansion.The significantly expanded timestamp regex patterns provide much better coverage of common timestamp formats, including ISO 8601 variants, bracketed timestamps, and various separator formats. This enhancement will improve log parsing capabilities across diverse log formats.
53-53
: LGTM: Delimiter configuration enhancement.The addition of the '=' character to the delimiters list is appropriate and supports the dictionary variable patterns that use equals signs as delimiters.
61-61
: LGTM: Named capture group addition.The addition of the named capture group
(?<val>...)
in the equals pattern enhances the parsing functionality by enabling capture-groups, which aligns with the PR objectives.components/core/tests/test-ParserWithUserSchema.cpp (4)
12-12
: LGTM: Appropriate header inclusion.The addition of
#include <log_surgeon/SchemaParser.hpp>
is necessary for the static method call and follows good practice.
31-31
: LGTM: Correct API update.The change from an instance method to the static method call
log_surgeon::SchemaParser::try_schema_file
properly adapts to the updated log-surgeon API.
168-180
: LGTM: Proper test adaptation to new scan interface.The structured bindings usage in the test cases correctly adapts to the new
scan
method that returns a tuple of error code and optional token. The pattern is consistent and properly handles the API change.
171-171
: LGTM: Consistent enum update.The enum reference update from
SymbolID::TokenEndID
toSymbolId::TokenEnd
is consistent with the library-wide enum renaming.taskfiles/deps/main.yaml (4)
22-22
: LGTM: Appropriate new dependency variable.The addition of
G_GSL_LIB_NAME: "Microsoft.GSL"
follows the existing pattern for library name variables and supports the new dependency.
70-70
: LGTM: Dependency integration.Adding
microsoft.gsl
to thecore-all-parallel
dependencies list properly integrates the new dependency into the build system.
225-239
: LGTM: Log-surgeon dependency updates.The updates to the log-surgeon task properly configure dependencies on both
fmt
andmicrosoft.gsl
, include the necessary CMake configuration files, and update to the new tarball with appropriate checksum verification.
241-253
: LGTM: Well-configured Microsoft GSL dependency.The new
microsoft.gsl
task follows the established patterns for dependency management, includes appropriate build configuration (disabling tests), and uses proper checksum verification for security.components/core/src/clp/Profiler.hpp (3)
48-56
: LGTM: Well-structured addition of new fragmented measurement indices.The new enumerators follow consistent naming conventions and provide granular profiling capabilities for different parsing operations. The additions align with the transition from continuous to fragmented measurements mentioned in the PR objectives.
64-65
: LGTM: Consistent disabling of continuous measurements.The disabling of both
Compression
andSearch
continuous measurements is consistent with the overall strategy to transition profiling to fragmented measurements, as indicated by the AI summary.
70-78
: LGTM: Selective enabling strategy for fragmented measurements.The configuration appropriately enables only
Watch
andTokenCreation
measurements while keeping others disabled. This selective approach suggests a performance-conscious profiling strategy that focuses on key operations without overwhelming the system with excessive measurement overhead.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
components/core/config/schemas.txt (2)
84-84
: Delimiter list readability issue already raisedThe inconsistent escaping in the delimiter string (mix of raw and escaped characters) remains. See previous discussion; consider adopting a single, consistent style.
97-97
:hex
pattern still excludes digits 0-9Standard hex values like
deadbeef
and0F0F
should be accepted. Prior feedback hasn’t been applied.-hex:[a-fA-F]+ +hex:[0-9a-fA-F]+
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/config/schemas.txt
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.
components/core/config/schemas.txt (4)
Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like (?:\d{1,2}| \d)
are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like [ 0-9]{2}
that allow character classes with spaces are valid in log-surgeon schemas.
Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the .
character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .*
in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.
🪛 LanguageTool
components/core/config/schemas.txt
[typographical] ~25-~25: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...-/]\d{2}[T ]\d{2}:\d{2}:\d{2} // E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-...
(EL_UNPAIRED_BRACKETS)
[typographical] ~26-~26: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...// E.g. [2015-01-31T15:50:45 timestamp:[\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2} /...
(EL_UNPAIRED_BRACKETS)
[typographical] ~57-~57: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...z]+ \d{2}, \d{4} \d{2}:\d{2} // E.g. E [31/Jan/2015:15:50:45 // E.g. localhost -...
(EL_UNPAIRED_BRACKETS)
[typographical] ~58-~58: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...Jan/2015:15:50:45 // E.g. localhost - - [01/Jan/2016:15:50:17 // E.g. 192.168.4.5...
(EL_UNPAIRED_BRACKETS)
[typographical] ~59-~59: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...n/2016:15:50:17 // E.g. 192.168.4.5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[...
(EL_UNPAIRED_BRACKETS)
[typographical] ~60-~60: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ....5 - - [01/Jan/2016:15:50:17 timestamp:[\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d...
(EL_UNPAIRED_BRACKETS)
[typographical] ~62-~62: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...2}:\d{2}:\d{2} // E.g. 192.168.4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d...
(EL_UNPAIRED_BRACKETS)
[typographical] ~63-~63: Αταίριαστο σύμβολο: το ']' φαίνεται πως λείπει
Context: ...4.5 - - [01/01/2016:15:50:17 timestamp:[\d{2}/\d{2}/\d{4}:\d{2}:\d{2}:\d{2} // ...
(EL_UNPAIRED_BRACKETS)
[typographical] ~84-84: Αταίριαστο σύμβολο: το '"' φαίνεται πως λείπει /...
Context: .../ Delimiters // === delimiters: \t\r\n!"#$%&'()*,:;<=>?@[]^_`{|}
(EL_UNPAIRED_BRACKETS)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
components/core/config/schemas.txt (3)
68-68
: Digits still excluded fromhex
pattern
Issue already raised previously; pattern remains[a-fA-F]+
, rejecting valid hex like7E57
.
34-38
: Add a note explaining the intentionally-missing closing]
The pattern stops before
\]
because]
is a delimiter. An inline comment would save future readers from thinking the bracket was forgotten.
55-55
: Mixed escaping in delimiter list hurts readability
Earlier nitpick about consistently escaping or un-escaping special chars is still unresolved.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/config/schemas.txt
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.
components/core/config/schemas.txt (5)
Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like (?:\d{1,2}| \d)
are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like [ 0-9]{2}
that allow character classes with spaces are valid in log-surgeon schemas.
Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the .
character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .*
in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., REQUIRE((expression))
) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.
🪛 LanguageTool
components/core/config/schemas.txt
[uncategorized] ~58-~58: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...*,:;<=>?@[]^_\{\|\}~ // === // Specially-encoded variables (using the
intand
float` ...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: lint-check (ubuntu-latest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
components/core/src/clp/clo/clo.cpp (1)
22-29
: Incorrect namespace qualification for FileDecompressor
using clp::clp::FileDecompressor;
is illegal – there is no nestedclp::clp
namespace.
This will fail to compile and should be simplified to a single-level qualification.-using clp::clp::FileDecompressor; +using clp::FileDecompressor;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/clo/clo.cpp
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/clo/clo.cpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.
components/core/src/clp/clo/clo.cpp (13)
Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp
, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the m_case_sensitive_search
flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.
Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl
class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.
Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint>
and <string>
in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: LinZhihao-723
PR: #856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Learnt from: gibber9809
PR: #630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In components/core/src/clp_s/JsonParser.cpp
, validation and exception throwing are unnecessary in the get_archive_node_id
method when processing nodes, and should not be added.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp
, the run_serializer
function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp
, the run_serializer
function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In components/core/src/clp_s/JsonParser.cpp
, when handling errors in parse_from_ir
, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In components/core/src/clp_s/SchemaTree.hpp
, the SchemaNode
class uses std::unique_ptr<char[]> m_key_buf
and std::string_view m_key_name
to ensure that references to m_key_name
remain valid even after SchemaNode
is move-constructed.
Learnt from: LinZhihao-723
PR: #544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:170-186
Timestamp: 2024-09-24T22:33:59.488Z
Learning: Only trivially destructible global objects are allowed in the coding standard, so avoid suggesting the use of non-trivially destructible global objects like static std::unordered_map
.
Learnt from: haiqi96
PR: #646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where m_use_single_file_archive
is initialized to false
.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
🔇 Additional comments (1)
components/core/src/clp/clo/clo.cpp (1)
493-501
: Guard against unusable lexer when schema is missing
Grep::process_raw_query
is always given a lexer reference, yet when no schema exists the lexer has no delimiters or token table.
Ifprocess_raw_query
relies on a populated lexer this may lead to undefined behaviour or silent mis-parses.Either:
- Confirm that an “empty” lexer is acceptable and document the assumption, or
- Skip lexer loading entirely and signal the absence with
std::nullopt
/ overloads.If option 2 is chosen, the earlier refactor to remove the heap allocation enables:
log_surgeon::lexers::ByteLexer* lexer_ptr = nullptr; if (std::filesystem::exists(schema_file_path)) { lexer_ptr = &lexer; load_lexer_from_file(schema_file_path.string(), *lexer_ptr); } ... Grep::process_raw_query(..., lexer_ptr ? *lexer_ptr : default_lexer, use_heuristic);Validate the expectations inside
process_raw_query
before merging.
…into new-log-surgeon
Description
Update log-surgeon to the latest release:
Reason for update:
Code changes:
scan
API (now returns an optional token, instead of taking a token by reference).SymbolID
toSymbolId
, and update corresponding enum names (drop theId
/ID
suffix).Regex
prefix forByteNfaState
.add_delimiters
toset_delimiters
.schema.txt
and updateequals
variable to use capture groups.reverse_lexer
as its no longer supported bylog-surgeon
. As a result, fall back to grepping all files if search variable has a leading wildcard.forward_lexer
to justlexer
.is_possible_input
to use anstd::array
in place of a c-style array.Unit-test changes
>
is now in the action table ofint
, so is a suggested character to proceed?
is now used in capture groups, so has to be escaped in the schema.Taskfile changes
microsoft.gsl
.log-surgeon
task to:fmt
andmicrosoft.gsl
.Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores