-
Notifications
You must be signed in to change notification settings - Fork 81
feat(clp-s)!: Use core clp parsing and search code in clp-s; Bump archive version to 0.4.0. #1163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change removes the legacy Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ColumnWriter
participant EncodedVariableInterpreter
participant LogTypeDictWriter
participant VarDictWriter
Client->>ColumnWriter: add_value(message)
ColumnWriter->>EncodedVariableInterpreter: encode_and_add_to_dictionary(message, ...)
EncodedVariableInterpreter->>VarDictWriter: add_entry(variable)
EncodedVariableInterpreter-->>ColumnWriter: encoded_vars, temp_var_dict_ids
ColumnWriter->>LogTypeDictWriter: add_entry(logtype_entry, ...)
ColumnWriter-->>Client: encoded_id
sequenceDiagram
participant ColumnReader
participant EncodedVariableInterpreter
participant LogTypeDictReader
participant VarDictReader
ColumnReader->>LogTypeDictReader: get_entry(logtype_id)
ColumnReader->>VarDictReader: get_entry(var_id)
ColumnReader->>EncodedVariableInterpreter: decode_variables_into_message(logtype_entry, var_dict, encoded_vars)
EncodedVariableInterpreter-->>ColumnReader: message
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
components/core/src/clp_s/DictionaryEntry.cpp (1)
235-244
: Remove or document unused parameter.The
lazy
parameter is accepted but not used inVariableDictionaryEntry::read_from_file
. Either remove it or add a comment explaining why it's kept for API consistency withLogTypeDictionaryEntry
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (32)
components/core/CMakeLists.txt
(0 hunks)components/core/cmake/Options/options.cmake
(2 hunks)components/core/src/clp/StringReader.cpp
(0 hunks)components/core/src/clp/ir/parsing.cpp
(0 hunks)components/core/src/clp/ir/parsing.hpp
(2 hunks)components/core/src/clp_s/CMakeLists.txt
(4 hunks)components/core/src/clp_s/ColumnReader.cpp
(3 hunks)components/core/src/clp_s/ColumnWriter.cpp
(3 hunks)components/core/src/clp_s/ColumnWriter.hpp
(5 hunks)components/core/src/clp_s/DictionaryEntry.cpp
(5 hunks)components/core/src/clp_s/DictionaryEntry.hpp
(9 hunks)components/core/src/clp_s/DictionaryReader.hpp
(7 hunks)components/core/src/clp_s/DictionaryWriter.cpp
(2 hunks)components/core/src/clp_s/DictionaryWriter.hpp
(6 hunks)components/core/src/clp_s/SingleFileArchiveDefs.hpp
(1 hunks)components/core/src/clp_s/Utils.hpp
(1 hunks)components/core/src/clp_s/VariableDecoder.cpp
(0 hunks)components/core/src/clp_s/VariableDecoder.hpp
(0 hunks)components/core/src/clp_s/VariableEncoder.cpp
(0 hunks)components/core/src/clp_s/VariableEncoder.hpp
(0 hunks)components/core/src/clp_s/indexer/CMakeLists.txt
(2 hunks)components/core/src/clp_s/search/CMakeLists.txt
(1 hunks)components/core/src/clp_s/search/Output.cpp
(0 hunks)components/core/src/clp_s/search/Output.hpp
(0 hunks)components/core/src/clp_s/search/QueryRunner.cpp
(5 hunks)components/core/src/clp_s/search/QueryRunner.hpp
(4 hunks)components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp
(0 hunks)components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.hpp
(0 hunks)components/core/src/clp_s/search/clp_search/Grep.cpp
(0 hunks)components/core/src/clp_s/search/clp_search/Grep.hpp
(0 hunks)components/core/src/clp_s/search/clp_search/Query.cpp
(0 hunks)components/core/src/clp_s/search/clp_search/Query.hpp
(0 hunks)
💤 Files with no reviewable changes (15)
- components/core/src/clp/StringReader.cpp
- components/core/src/clp_s/search/Output.cpp
- components/core/src/clp/ir/parsing.cpp
- components/core/src/clp_s/search/Output.hpp
- components/core/src/clp_s/search/clp_search/Grep.hpp
- components/core/src/clp_s/VariableDecoder.hpp
- components/core/CMakeLists.txt
- components/core/src/clp_s/VariableEncoder.hpp
- components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.hpp
- components/core/src/clp_s/search/clp_search/Query.hpp
- components/core/src/clp_s/search/clp_search/Query.cpp
- components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp
- components/core/src/clp_s/VariableDecoder.cpp
- components/core/src/clp_s/search/clp_search/Grep.cpp
- components/core/src/clp_s/VariableEncoder.cpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/Utils.hpp
components/core/src/clp_s/DictionaryWriter.cpp
components/core/src/clp_s/DictionaryWriter.hpp
components/core/src/clp_s/DictionaryReader.hpp
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
components/core/src/clp_s/DictionaryEntry.cpp
components/core/src/clp_s/DictionaryEntry.hpp
🧠 Learnings (50)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
📚 Learning: in the clp project's archive version format (semver), the patch version uses 16 bits (uint16_t), whi...
Learnt from: davemarco
PR: y-scope/clp#698
File: components/core/src/clp/streaming_archive/Constants.hpp:9-9
Timestamp: 2025-01-28T03:02:30.542Z
Learning: In the CLP project's archive version format (semver), the patch version uses 16 bits (uint16_t), while major version uses 8 bits and minor version uses 16 bits.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
📚 Learning: member variables in c++ classes should be explicitly initialized in the constructor to prevent undef...
Learnt from: haiqi96
PR: y-scope/clp#646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where `m_use_single_file_archive` is initialized to `false`.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/DictionaryReader.hpp
📚 Learning: in the clp project, when reviewing cmakelists.txt changes that introduce new compression library dep...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/cmake/Options/options.cmake
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floa...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, validation and exception throwing are unnecessary in ...
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/Utils.hpp
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/DictionaryReader.hpp
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/ColumnWriter.hpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in the clp project, c++20 is used, allowing the utilization of c++20 features such as class template...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `get_archive_node_id` method, throwing a s...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Applied to files:
components/core/src/clp_s/SingleFileArchiveDefs.hpp
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/DictionaryReader.hpp
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/ColumnWriter.hpp
components/core/src/clp_s/DictionaryEntry.cpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serial...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of ir files, mak...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `parse_from_ir` method, encountering error...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/DictionaryEntry.cpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in `components/core/src/clp/bufferedfilereader.hpp`, the `bufferedfilereader::try_read` function exp...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `get_archive_node_id` function, validation...
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp_s/ColumnReader.cpp
📚 Learning: in the clp project, antlr code generation at build time is being removed by another pr. when reviewi...
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/cmake/Options/options.cmake
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in clp's custom cmake find modules, `findstaticlibrarydependencies` populates the `*_dynamic_libs` v...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1124
File: components/core/cmake/Modules/FindBZip2.cmake:49-65
Timestamp: 2025-08-03T18:56:07.366Z
Learning: In CLP's custom CMake Find modules, `FindStaticLibraryDependencies` populates the `*_DYNAMIC_LIBS` variable even when building with static libraries. When the main library is static, all dependencies must also use static libraries - there's no fallback to shared libraries for dependencies.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/cmake/Options/options.cmake
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in the c++ `filedecompressor` implementations at `components/core/src/clp/clp/filedecompressor.cpp` ...
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in cmakelists.txt files, anlowee prefers to explicitly list source files one by one rather than usin...
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/indexer/CMakeLists.txt
components/core/src/clp_s/CMakeLists.txt
📚 Learning: bill-hbrhbr prefers concise, straightforward code over verbose defensive programming in cmake files....
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1124
File: components/core/CMakeLists.txt:171-172
Timestamp: 2025-08-04T17:23:14.646Z
Learning: Bill-hbrhbr prefers concise, straightforward code over verbose defensive programming in CMake files. When find_package uses REQUIRED flag, additional existence checks are redundant and should be avoided.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/cmake/Options/options.cmake
📚 Learning: in clp's custom cmake find modules (like findbzip2.cmake), when building with dynamic libraries, the...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1124
File: components/core/cmake/Modules/FindBZip2.cmake:60-60
Timestamp: 2025-07-24T21:56:05.806Z
Learning: In CLP's custom CMake Find modules (like FindBZip2.cmake), when building with dynamic libraries, the code intentionally uses `bzip2_PKGCONF_STATIC_LIBRARIES` as input to `FindDynamicLibraryDependencies`. This is a deliberate fallback mechanism to ensure all library dependencies are resolved even in dynamic builds, using the static library list from pkg-config as a source of dependency information.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/cmake/Options/options.cmake
📚 Learning: the team plans to systematically improve include paths by adding `${project_source_dir}/src` to cmak...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in the clp project, manually setting ldflags for library paths can cause runtime errors because it i...
Learnt from: sitaowang1998
PR: y-scope/clp#1044
File: .github/workflows/clp-core-build-macos.yaml:0-0
Timestamp: 2025-06-27T01:49:15.724Z
Learning: In the clp project, manually setting LDFLAGS for library paths can cause runtime errors because it interferes with CMake's built-in library discovery and linking mechanisms. CMake can identify and correctly link to libraries on its own without requiring manual LDFLAGS configuration.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/cmake/Options/options.cmake
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `parse_from_ir()` function, reaching the e...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/DictionaryEntry.cpp
📚 Learning: in the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible f...
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp/ir/parsing.hpp
📚 Learning: in the file `components/core/tests/test-ffi_keyvaluepairlogevent.cpp`, including `
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.cpp
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, it's acceptable to use `std::string_view` as keys in ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp_s/search/CMakeLists.txt
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/DictionaryWriter.cpp
components/core/src/clp_s/DictionaryWriter.hpp
components/core/src/clp_s/DictionaryReader.hpp
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in clp's cmake configuration for bzip2, bill-hbrhbr prefers using hints parameter for path-based res...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1124
File: components/core/CMakeLists.txt:167-170
Timestamp: 2025-08-04T17:26:17.098Z
Learning: In CLP's CMake configuration for BZip2, Bill-hbrhbr prefers using HINTS parameter for path-based resolution rather than version pinning in find_package(). The primary concern is ensuring the task-installed BZip2 is found instead of system copies, not enforcing specific versions.
Applied to files:
components/core/cmake/Options/options.cmake
📚 Learning: for installation scripts in the clp project, prefer explicit error handling over automatic dependenc...
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Applied to files:
components/core/cmake/Options/options.cmake
📚 Learning: the `queryhandlerimpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is as...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.
Applied to files:
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, the `schemanode` class uses `std::unique_ptr ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Applied to files:
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/DictionaryReader.hpp
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, within the `schemanode` class, the use of `std::strin...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Applied to files:
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/DictionaryWriter.cpp
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: for the `queryhandlerimpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over ...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Applied to files:
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in the `queryhandlerimpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in the clp codebase, the filteroperation::exists and filteroperation::nexists operations have specif...
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in `components/core/tests/test-error_handling.cpp` (c++), when specializing template methods of `err...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.
Applied to files:
components/core/src/clp/ir/parsing.hpp
📚 Learning: the function `handle_schema_tree_node_insertion` in `irunithandler` must accept `clp::ffi::schematre...
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Applied to files:
components/core/src/clp/ir/parsing.hpp
📚 Learning: when working with constexpr strings (string literals with static storage duration), std::string_view...
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp/ir/parsing.hpp
components/core/src/clp_s/DictionaryWriter.cpp
📚 Learning: in `components/core/src/clp/streaming_archive/reader/segment.cpp`, the method `m_decompressor.open()...
Learnt from: LinZhihao-723
PR: y-scope/clp#450
File: components/core/src/clp/streaming_archive/reader/Segment.cpp:57-58
Timestamp: 2024-11-14T01:03:27.144Z
Learning: In `components/core/src/clp/streaming_archive/reader/Segment.cpp`, the method `m_decompressor.open()` does not have a return value.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/DictionaryEntry.cpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verifi...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msg...
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: the variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type chan...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: The variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type change, even if the actual variable type has been altered.
Applied to files:
components/core/src/clp_s/DictionaryWriter.hpp
📚 Learning: `keyvaluepairlogevent::create` requires `std::shared_ptr`, so `m_auto_generated_schema_t...
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in the clp codebase, the `m_case_sensitive_search` flag is used only for actual string value compari...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in the clp search component, the `evaluate_wildcard_filter` function should return `astevaluationres...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: `clp::ffi::schematree::node::id_t` is of type `uint32_t`, so adding `1` to `int32_max` is safe and d...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
Applied to files:
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `keyvaluepairlogevent.cpp`, for the class `jsonserializationiterator`, it's acceptable to use raw...
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Applied to files:
components/core/src/clp_s/DictionaryEntry.cpp
components/core/src/clp_s/DictionaryEntry.hpp
📚 Learning: in the `jsonparser::parse_kv_log_event` method in `components/core/src/clp_s/jsonparser.cpp`, prefer...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:629-733
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the `JsonParser::parse_kv_log_event` method in `components/core/src/clp_s/JsonParser.cpp`, prefer passing exceptions to higher levels for more graceful handling, rather than handling them at that level.
Applied to files:
components/core/src/clp_s/DictionaryEntry.cpp
📚 Learning: "try_get_pos" method in `compressor.hpp` does not throw exceptions, as clarified by bill-hbrhbr....
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.hpp:65-75
Timestamp: 2024-12-19T01:40:13.272Z
Learning: "try_get_pos" method in `Compressor.hpp` does not throw exceptions, as clarified by Bill-hbrhbr.
Applied to files:
components/core/src/clp_s/DictionaryEntry.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (54)
components/core/src/clp_s/SingleFileArchiveDefs.hpp (1)
13-14
: LGTM! Archive version bump aligns with PR objectives.The version increment from 0.3.2 to 0.4.0 properly follows semantic versioning conventions and reflects the significant internal changes mentioned in the PR description regarding the integration of core clp parsing and search code into clp-s.
components/core/cmake/Options/options.cmake (2)
252-252
: LGTM! Addition of log_surgeon dependency for clp_dependencies.The addition of
CLP_NEED_LOG_SURGEON
aligns with the broader refactoring to integrate the log_surgeon library into the clp_s component.
315-315
: LGTM! Addition of log_surgeon dependency for search.The addition of
CLP_NEED_LOG_SURGEON
to the search dependencies is consistent with the architectural changes replacing legacy variable encoding/decoding implementations.components/core/src/clp_s/search/CMakeLists.txt (1)
43-43
: LGTM! Addition of log_surgeon library dependency.The public linking to
log_surgeon::log_surgeon
is consistent with the dependency flags added inoptions.cmake
and supports the architectural migration from legacy variable encoding implementations.components/core/src/clp_s/ColumnReader.cpp (4)
3-3
: LGTM! Updated include for new variable interpreter.The replacement of
VariableDecoder.hpp
with../clp/EncodedVariableInterpreter.hpp
aligns with the migration from legacy variable decoding to the new implementation.
116-116
: LGTM! Updated method name for consistency.The change from
get_num_vars()
toget_num_variables()
reflects API improvements in the dictionary entry interface.
118-123
: LGTM! Migration to new variable decoding implementation.The replacement of
VariableDecoder::decode_variables_into_message
withclp::EncodedVariableInterpreter::decode_variables_into_message
properly adopts the new encoding/decoding infrastructure while maintaining the same functional behaviour.
157-157
: LGTM! Consistent method name update.The change from
get_num_vars()
toget_num_variables()
maintains consistency with the updated dictionary entry API.components/core/src/clp_s/Utils.hpp (1)
215-225
: LGTM! Const-correctness improvements for UnalignedMemSpan.Adding const qualification to the
size()
,operator[]
, andsub_span()
methods is appropriate since these are accessor methods that don't modify the object state. This enhances the usability of the class by allowing these operations on const instances.components/core/src/clp_s/indexer/CMakeLists.txt (1)
16-62
: Excellent integration of core CLP components into the indexer build.The extensive addition of core CLP parsing, encoding, and search components creates a unified architecture that eliminates code duplication and ensures consistency across the CLP ecosystem. The added components cover all essential functionality: variable interpretation, FFI encoding, IR stream processing, schema handling, and query processing.
components/core/src/clp/ir/parsing.hpp (2)
16-17
: Necessary includes for the inline function implementation.The added includes support the new inline
is_variable_placeholder
function by providing access to theenum_to_underlying_type
utility andVariablePlaceholder
enum types.
29-38
: Excellent performance optimization through inlining.The inline implementation of
is_variable_placeholder
provides a significant performance boost (~50% improvement) for a frequently called function. The logic correctly identifies variable placeholder characters using safe enum-to-underlying conversion, and the inline placement is appropriate for this small, performance-critical function.components/core/src/clp_s/CMakeLists.txt (2)
20-96
: Comprehensive integration of core CLP components into clp_s build.The addition of core CLP parsing, encoding, and search components creates a unified architecture that supports the PR's objective of using core CLP code in clp-s. The additions cover all essential functionality including variable interpretation, FFI encoding, IR stream processing, and query handling.
116-116
: Necessary dependency for new lexer functionality.The addition of
log_surgeon::log_surgeon
as a public dependency supports the new lexer-based query processing architecture mentioned throughout the codebase changes.components/core/src/clp_s/search/QueryRunner.cpp (3)
8-10
: Proper migration to core CLP query components.The updated includes correctly replace the old clp_search namespace components with core CLP equivalents, supporting the architectural migration described in the PR.
205-205
: Successful migration to clp::Query type.The updated pointer type correctly uses the new
clp::Query*
from the core CLP components, maintaining functionality while adopting the new architecture.Also applies to: 278-278, 408-408
861-874
: Successful adaptation to new GrepCore interface.The updated
clp::GrepCore::process_raw_query
call correctly handles the new interface requirements including timestamp parameters, dictionary references instead of pointers, and the lexer parameter. The functionality is preserved while adopting the new architecture.components/core/src/clp_s/DictionaryWriter.cpp (3)
5-10
: Necessary includes for type safety improvements.The added includes properly support the strongly typed dictionary IDs and
std::string_view
parameter changes, enabling better type safety and performance.
14-14
: Excellent type safety and performance improvements.The updated signature using
std::string_view
for the value parameter improves performance by avoiding unnecessary string copies, whileclp::variable_dictionary_id_t
provides strong typing to prevent ID confusion. The explicit string construction in line 33 correctly handles the type conversion.Also applies to: 33-33
45-47
: Consistent type safety improvement for logtype dictionary.The strongly typed
clp::logtype_dictionary_id_t
parameter provides type safety consistent with the variable dictionary changes, preventing accidental mixing of different dictionary ID types.components/core/src/clp_s/ColumnWriter.cpp (4)
3-10
: LGTM! Clean header organization and appropriate includes.The new includes properly support the migration to core clp components and strongly typed dictionary IDs.
85-90
: Strong typing improves type safety.The change from generic
uint64_t
toclp::variable_dictionary_id_t
enhances type safety and makes the code more self-documenting. The use ofstd::string_view
for the dictionary lookup is also a good performance optimization.
92-95
: Size calculation correctly updated for strong types.The size calculation properly uses
sizeof(clp::variable_dictionary_id_t)
to match the new strongly typed member variable.
59-74
: EncodedVariableInterpreter interface usage validatedThe call in ClpStringColumnWriter::add_value matches the template overload:
- encode_and_add_to_dictionary(std::string_view, LogTypeDictionaryEntryType&, VariableDictionaryWriterType&, std::vector<encoded_variable_t>&, std::vector<variable_dictionary_id_t>&)
- m_temp_var_dict_ids is cleared before invocation to collect new IDs.
No further changes required.
components/core/src/clp_s/search/QueryRunner.hpp (4)
17-20
: Clean migration to core clp Query components.The inclusion of log_surgeon lexer and clp Query header properly supports the migration from the legacy clp_search namespace.
133-136
: Consistent type migration for query handling.The update of member variables to use
clp::Query
instead of the local Query type is consistent with the broader migration. The explicit namespace qualification improves code clarity.
155-156
: New lexer member supports core clp integration.The addition of
log_surgeon::lexers::ByteLexer m_dummy_lexer
aligns with the migration to core clp processing components that require lexer functionality.
247-251
: Method signature correctly updated for core clp Query.The parameter type change from local Query to
clp::Query*
maintains the same interface semantics while using the new type system.components/core/src/clp_s/DictionaryReader.hpp (4)
6-11
: LGTM! Proper includes for modernized interface.The new includes support the interface changes to use
std::string_view
and the updated string handling implementation.
31-33
: Type aliases improve template interface clarity.The addition of
dictionary_id_t
andentry_t
type aliases makes the templated interface more consistent and easier to use.
77-78
: Interface modernization with string_view improves performance.The change from
const std::string&
tostd::string_view
parameters avoids unnecessary string copies and provides better performance for dictionary lookups.Also applies to: 86-90
168-197
: Improved case-insensitive matching implementation.The refactored case-insensitive matching using
boost::algorithm::to_upper_copy
withstd::back_inserter
is more efficient than creating temporary strings. The implementation correctly handles both case-sensitive and case-insensitive scenarios.components/core/src/clp_s/DictionaryWriter.hpp (5)
6-8
: Performance improvement with absl::flat_hash_map.The migration from
std::unordered_map
toabsl::flat_hash_map
should provide better performance characteristics for dictionary lookups. The inclusion ofclp/Defs.h
properly supports the strongly typed dictionary IDs.
23-25
: Type aliases improve template interface consistency.The addition of
dictionary_id_t
andentry_t
type aliases matches the pattern used inDictionaryReader.hpp
and improves template interface clarity.
59-59
: Hash map type updated for performance.The change to
absl::flat_hash_map<std::string, DictionaryIdType>
should provide better performance than the previousstd::unordered_map
.
76-77
: Strongly typed dictionary IDs enhance type safety.The migration to
clp::variable_dictionary_id_t
and the interface change tostd::string_view
forVariableDictionaryWriter::add_entry
improve both type safety and performance.Also applies to: 91-91
94-95
: Consistent strong typing for logtype dictionary.The use of
clp::logtype_dictionary_id_t
forLogTypeDictionaryWriter
maintains consistency with the type safety improvements throughout the dictionary system.Also applies to: 109-109
components/core/src/clp_s/ColumnWriter.hpp (4)
7-7
: Proper include for strongly typed dictionary IDs.The inclusion of
../clp/Defs.h
correctly provides access to the strongly typed dictionary ID aliases used throughout the file.
144-146
: Static method signatures updated for type safety.The migration of static methods to use
clp::logtype_dictionary_id_t
instead of raw integer types improves type safety and makes the interface more explicit about the expected data types.Also applies to: 152-154, 163-165
177-177
: New member supports updated encoding interface.The addition of
m_temp_var_dict_ids
vector with strongly typedclp::variable_dictionary_id_t
elements supports the newEncodedVariableInterpreter
interface requirements shown in the corresponding.cpp
file.
197-197
: Consistent strong typing for variable storage.The change of
m_variables
to useclp::variable_dictionary_id_t
maintains consistency with the type safety improvements and matches the updatedadd_value
return type calculation.components/core/src/clp_s/DictionaryEntry.hpp (7)
6-13
: LGTM!The include changes are appropriate for the refactoring - adding necessary headers for
std::string_view
, size types, and the new CLP type definitions.
44-44
: Good type safety improvement!Using the strongly typed
clp::logtype_dictionary_id_t
instead of rawuint64_t
improves type safety and prevents accidental mixing of different ID types.
65-84
: Clear API improvement for placeholder handling!The distinction between total placeholders and actual variables is well-documented. The unified
get_placeholder_info()
method provides a cleaner interface than the previous multiple getter methods.
86-109
: Well-structured API for variable type handling!The explicit methods for each variable type (
add_int_var
,add_float_var
,add_dictionary_var
) and escape characters make the API clearer. Usingstd::string_view
inadd_constant
improves performance by avoiding unnecessary copies.
127-132
: Verify string_view lifetime for the output parameter.The
var
output parameter is now astring_view
which references the inputmsg
. Ensure callers understand thatvar
is only valid as long as the underlying string formsg
remains unchanged.
189-192
: Good internal representation improvement!Storing placeholder positions directly provides more flexibility than the previous length-based approach. The separate counter for escaped placeholders enables efficient calculation of actual variable count.
194-209
: Consistent type safety improvements!The
VariableDictionaryEntry
class correctly uses the strongly typedclp::variable_dictionary_id_t
, maintaining consistency with theLogTypeDictionaryEntry
changes.components/core/src/clp_s/DictionaryEntry.cpp (6)
5-24
: LGTM!The updated includes and using declarations are well-organized and appropriate for the refactored implementation using the new CLP encoding/decoding infrastructure.
26-31
: Correct size calculation with helpful documentation!The data size calculation properly includes the placeholder positions vector, and the comment clearly explains the compile-time
sizeof
evaluation.
33-45
: Implementation looks correct.The bounds checking and placeholder extraction logic is sound. The cast to
VariablePlaceholder
assumes the logtype is well-formed, which should be guaranteed by the construction methods.
55-74
: Well-structured variable type handling!Each method correctly tracks the placeholder position before delegating to
EncodedVariableInterpreter
. The escape counter is properly maintained for accurate variable count calculations.
76-109
: Elegant escape handling with efficient string processing!The escape handler lambda cleanly integrates escape sequence tracking. Using
string_view
throughout and delegating to IR parsing functions provides an efficient and maintainable implementation.
164-204
: Clean state machine implementation for log type decoding!The character-by-character processing with escape state tracking is clear and correct. The two overloads provide flexibility while avoiding unnecessary allocations.
../clp/ir/LogEvent.hpp | ||
../clp/ir/types.hpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's alphabetically reorder these two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left with four files to review
@@ -152,6 +152,8 @@ class QueryRunner : public FilterClass { | |||
std::vector<std::pair<ExpressionType, ast::OpList::iterator>>> | |||
m_expression_state; | |||
|
|||
log_surgeon::lexers::ByteLexer m_dummy_lexer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe m_placeholder_lexer
?
} | ||
|
||
void VariableStringColumnWriter::store(ZstdCompressor& compressor) { | ||
size_t size = m_variables.size() * sizeof(int64_t); | ||
size_t size = m_variables.size() * sizeof(clp::variable_dictionary_id_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use auto?
it's not related to your change, but since we already have some auto used in this file, replacing size_t with auto doesn't harm
VariableEncoder::encode_and_add_to_dictionary( | ||
string_var, | ||
uint64_t offset{m_encoded_vars.size()}; | ||
m_temp_var_dict_ids.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that making this to be local is better than creating a class member specifically for a temporary usage purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer class member over local in this instance since it helps avoid a large number of memory allocations in some frequently run code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, I agree with Haiqi. Unless we have metrics to back up the importance of this it feels very much like a premature optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair -- I'm kicking off some compression benchmarks now that I'll check on later today. I'll make the change depending on the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it seems to help a bit with some datasets but probably not enough to justify it. I'll get rid of the class member.
@@ -174,6 +174,7 @@ class ClpStringColumnWriter : public BaseColumnWriter { | |||
|
|||
std::vector<int64_t> m_logtypes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be a type mismatch?
at here, the return type of encode_log_dict_id is uint64_t.
You might also want to double check type for m_encoded_vars.
Not sure if we need to make a specific type for encode_log_dict_id
like we do in clp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- I'm adding a using encoded_log_dict_id_t = uint64_t
that I'll use all over and changed the encoded variables to clp::encoded_variable_t
respectively.``
@@ -193,7 +194,7 @@ class VariableStringColumnWriter : public BaseColumnWriter { | |||
|
|||
private: | |||
std::shared_ptr<VariableDictionaryWriter> m_var_dict; | |||
std::vector<int64_t> m_variables; | |||
std::vector<clp::variable_dictionary_id_t> m_variables; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we rename this to m_variable_ids
(or m_variables_ids
?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll go with m_var_dict_ids
since I see that use xyz_var_dict_ids
used in clp/Query.hpp
at least.
@@ -174,6 +174,7 @@ class ClpStringColumnWriter : public BaseColumnWriter { | |||
|
|||
std::vector<int64_t> m_logtypes; | |||
std::vector<int64_t> m_encoded_vars; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere in encode_and_add_to_dictionary we are implicilty converting encoded_var_t to int64_t. when we push an encoded_var to this vector This doesn't cause any issue because encoded_var_t
is currently the same as int64_t, but to prevent future issues, I think we should either do a explicit cast, or use clp::encoded_var_t
as the vector datatype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this to clp::encoded_var_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the DictionaryEntry.cpp/hpp, since you are doing a major refactor, I will skip the old version and focus on understanding how's the newer version different from what we use in CLP. (let me know if there;s anything you want me to look at in the older version).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From high-level, I am trying to understand what's the difference between the clp and clp-s's dictionary entry, and what's preventing us to reuse the clp's one (or maybe create a common one that can be shared by the two).
I found out that between CLP and CLP-S, most code of dictionary entry are alike, and Here's the major difference that I noticed;
- Only CLP's dictionary entry has segment support.
- CLP-S' dictionary entry has lazy read support.
- CLP-S uses string_view whereas CLP uses string const& for some function interface.
- They seem to use different errorcode.
And a few other differences that we can potentially work around?
- CLP-S dictionary entry only supports Zstd compressor.
- CLP-S has a decode method, which I think can be supported for CLP as well.
I feel that there can be some refactor/class inheritance we can do to reduce code duplications, but not necessarily in this PR. wonder have you thought about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they also have different encoding when writing to file.
It is something I thought about, but I figured it is simpler to not share it for now. We'll probably end up adding lazy initialization support + optional support for encoding the entry the clp-s way to the clp implementation to further deduplicate the core in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, and I didn't notice that variable dictionary entries handle the id in slight different way.
If you don't plan to work on the deduplication in short term, I think it might be a good idea to create an issue and document the differences between the CLP and CLP-S, so later on when someone has to work on this, he/she has something to refer to. how do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I can write an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up as #1165.
Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
components/core/src/clp_s/indexer/CMakeLists.txt (1)
2-63
: Source list maintenance is becoming brittleThe
INDEXER_SOURCES
list is hand-enumerating >100 paths, many of which are outside theclp_s
directory (../../clp/**
). As the shared CLP code evolves, keeping this list in sync will be error-prone.Consider:
# In clp core CMakeLists.txt add_library(clp_core OBJECT ...) # In this directory target_link_libraries(indexer PRIVATE clp_core)or at least using
file(GLOB_RECURSE …)
with well-scoped patterns. This reduces merge conflicts and hides implementation churn.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp_s/indexer/CMakeLists.txt
(2 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
📚 Learning: in the clp project, antlr code generation at build time is being removed by another pr. when reviewi...
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in the clp project, when reviewing cmakelists.txt changes that introduce new compression library dep...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: the team plans to systematically improve include paths by adding `${project_source_dir}/src` to cmak...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of ir files, mak...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in the clp project, manually setting ldflags for library paths can cause runtime errors because it i...
Learnt from: sitaowang1998
PR: y-scope/clp#1044
File: .github/workflows/clp-core-build-macos.yaml:0-0
Timestamp: 2025-06-27T01:49:15.724Z
Learning: In the clp project, manually setting LDFLAGS for library paths can cause runtime errors because it interferes with CMake's built-in library discovery and linking mechanisms. CMake can identify and correctly link to libraries on its own without requiring manual LDFLAGS configuration.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in the c++ `filedecompressor` implementations at `components/core/src/clp/clp/filedecompressor.cpp` ...
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: in 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floa...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/src/clp_s/indexer/CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, false)
../../clp/ffi/encoding_methods.inc | ||
../../clp/ffi/ir_stream/byteswap.hpp | ||
../../clp/ffi/ir_stream/decoding_methods.cpp | ||
../../clp/ffi/ir_stream/decoding_methods.hpp | ||
../../clp/ffi/ir_stream/decoding_methods.inc | ||
../../clp/ffi/ir_stream/protocol_constants.hpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
.inc
files should not be listed as sources
The build now lists several *.inc
files (Lines 22–27, 44). CMake does not treat these as compilation units, so they show up as “missing source” warnings in some IDEs and clutter the target. They should instead be referenced via target_sources(... PRIVATE HEADER_FILE_ONLY ...)
or dropped from the list entirely.
Also applies to: 44-44
🤖 Prompt for AI Agents
In components/core/src/clp_s/indexer/CMakeLists.txt around lines 22 to 27 and
line 44, the .inc files are incorrectly listed as source files, causing IDE
warnings and clutter. Remove these .inc files from the main source list and
instead add them using target_sources with the PRIVATE HEADER_FILE_ONLY option,
or omit them entirely if they are included only within other source files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
components/core/src/clp_s/CMakeLists.txt (1)
20-21
: Consider alphabetical ordering of new includes.The
EncodedVariableInterpreter.hpp
should come afterEncodedVariableInterpreter.cpp
to maintain consistency with alphabetical ordering patterns in this file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/core/src/clp_s/CMakeLists.txt
(4 hunks)components/core/src/clp_s/ColumnWriter.cpp
(3 hunks)components/core/src/clp_s/ColumnWriter.hpp
(6 hunks)components/core/src/clp_s/search/QueryRunner.cpp
(5 hunks)components/core/src/clp_s/search/QueryRunner.hpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/search/QueryRunner.cpp
components/core/src/clp_s/ColumnWriter.hpp
🧠 Learnings (46)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
📚 Learning: the `queryhandlerimpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is as...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: for the `queryhandlerimpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over ...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, the `schemanode` class uses `std::unique_ptr ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, it's acceptable to use `std::string_view` as keys in ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, within the `schemanode` class, the use of `std::strin...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in the `queryhandlerimpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/search/QueryRunner.cpp
📚 Learning: in the file `components/core/tests/test-ffi_keyvaluepairlogevent.cpp`, including `
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `parse_from_ir()` function, reaching the e...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `parse_from_ir` method, encountering error...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `get_archive_node_id` method, throwing a s...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of ir files, mak...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, validation and exception throwing are unnecessary in ...
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in the clp codebase, the filteroperation::exists and filteroperation::nexists operations have specif...
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in the clp search component, the `evaluate_wildcard_filter` function should return `astevaluationres...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible f...
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
components/core/src/clp_s/search/QueryRunner.cpp
📚 Learning: `keyvaluepairlogevent::create` requires `std::shared_ptr`, so `m_auto_generated_schema_t...
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: in the clp codebase, the `m_case_sensitive_search` flag is used only for actual string value compari...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.
Applied to files:
components/core/src/clp_s/search/QueryRunner.hpp
📚 Learning: when using std::array, size checking between arrays is enforced at compile-time through the type sys...
Learnt from: davemarco
PR: y-scope/clp#646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: only trivially destructible global objects are allowed in the coding standard, so avoid suggesting t...
Learnt from: LinZhihao-723
PR: y-scope/clp#544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:170-186
Timestamp: 2024-09-24T22:33:59.488Z
Learning: Only trivially destructible global objects are allowed in the coding standard, so avoid suggesting the use of non-trivially destructible global objects like `static std::unordered_map`.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: when using structured bindings with values that will be moved later (e.g., into a constructor), pref...
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: when working with constexpr strings (string literals with static storage duration), std::string_view...
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in c++ classes, member variables can be initialized using brace initialization in the class definiti...
Learnt from: gibber9809
PR: y-scope/clp#1021
File: components/core/src/clp_s/ColumnWriter.cpp:14-24
Timestamp: 2025-06-19T16:15:50.601Z
Learning: In C++ classes, member variables can be initialized using brace initialization in the class definition (e.g., `int64_t m_cur{};`), which initializes the variable to its default value (0 for integral types). This is equivalent to explicit initialization and eliminates the need for constructor initialization.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: in `keyvaluepairlogevent.cpp`, for the class `jsonserializationiterator`, it's acceptable to use raw...
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: in the clp codebase, when adding elements to a collection inside a loop, it's preferred to use `try_...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:200-213
Timestamp: 2025-05-02T23:52:24.294Z
Learning: In the CLP codebase, when adding elements to a collection inside a loop, it's preferred to use `try_emplace` inside the loop (rather than initializing collections outside the loop) to avoid creating empty collections when there might be no elements to add.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: member variables in c++ classes should be explicitly initialized in the constructor to prevent undef...
Learnt from: haiqi96
PR: y-scope/clp#646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where `m_use_single_file_archive` is initialized to `false`.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: in the clp project, antlr code generation at build time is being removed by another pr. when reviewi...
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in the clp project, when reviewing cmakelists.txt changes that introduce new compression library dep...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floa...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in the c++ `filedecompressor` implementations at `components/core/src/clp/clp/filedecompressor.cpp` ...
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in clp's custom cmake find modules, `findstaticlibrarydependencies` populates the `*_dynamic_libs` v...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1124
File: components/core/cmake/Modules/FindBZip2.cmake:49-65
Timestamp: 2025-08-03T18:56:07.366Z
Learning: In CLP's custom CMake Find modules, `FindStaticLibraryDependencies` populates the `*_DYNAMIC_LIBS` variable even when building with static libraries. When the main library is static, all dependencies must also use static libraries - there's no fallback to shared libraries for dependencies.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in clp installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistenc...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serial...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in cmakelists.txt files, anlowee prefers to explicitly list source files one by one rather than usin...
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in `components/core/src/clp/streaming_archive/reader/segment.cpp`, the method `m_decompressor.open()...
Learnt from: LinZhihao-723
PR: y-scope/clp#450
File: components/core/src/clp/streaming_archive/reader/Segment.cpp:57-58
Timestamp: 2024-11-14T01:03:27.144Z
Learning: In `components/core/src/clp/streaming_archive/reader/Segment.cpp`, the method `m_decompressor.open()` does not have a return value.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in the clp project, c++20 is used, allowing the utilization of c++20 features such as class template...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verifi...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: in the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msg...
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp_s/CMakeLists.txt
📚 Learning: `clp::ffi::schematree::node::id_t` is of type `uint32_t`, so adding `1` to `int32_max` is safe and d...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
Applied to files:
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `components/core/src/clp/ffi/keyvaluepairlogevent.cpp`, node ids are validated before accessing `...
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Applied to files:
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in the ir encoding method tests, `int8_t` is used for byte buffers instead of `uint8_t`....
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:131-139
Timestamp: 2024-09-27T23:07:22.429Z
Learning: In the IR encoding method tests, `int8_t` is used for byte buffers instead of `uint8_t`.
Applied to files:
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: in `deserialize_int_val`, the integer type needs to be determined at runtime, so refactoring into a ...
Learnt from: LinZhihao-723
PR: y-scope/clp#544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:230-261
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `deserialize_int_val`, the integer type needs to be determined at runtime, so refactoring into a templated helper function does not simplify the logic.
Applied to files:
components/core/src/clp_s/ColumnWriter.hpp
📚 Learning: the variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type chan...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: The variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type change, even if the actual variable type has been altered.
Applied to files:
components/core/src/clp_s/ColumnWriter.hpp
🧬 Code Graph Analysis (2)
components/core/src/clp_s/ColumnWriter.cpp (4)
components/core/src/clp_s/ColumnWriter.hpp (18)
value
(28-28)value
(58-58)value
(75-75)value
(93-93)value
(110-110)value
(137-137)value
(195-195)value
(213-213)id
(166-169)id
(167-167)compressor
(34-34)compressor
(60-60)compressor
(77-77)compressor
(95-95)compressor
(112-112)compressor
(139-139)compressor
(197-197)compressor
(215-215)components/core/src/clp_s/DictionaryWriter.hpp (1)
value
(91-91)components/core/src/clp_s/ParsedMessage.hpp (2)
value
(64-66)value
(64-64)components/core/src/clp_s/SchemaWriter.cpp (2)
store
(29-33)store
(29-29)
components/core/src/clp_s/search/QueryRunner.cpp (2)
components/core/src/clp_s/search/QueryRunner.hpp (6)
expr
(176-176)expr
(184-184)expr
(192-192)expr
(401-401)expr
(414-414)expr
(420-420)components/core/src/clp_s/ArchiveReader.hpp (2)
m_log_dict
(108-108)m_var_dict
(106-106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: lint-check (macos-15)
- GitHub Check: build (macos-15)
🔇 Additional comments (10)
components/core/src/clp_s/search/QueryRunner.cpp (3)
8-10
: LGTM! Header updates align with the namespace refactoring.The inclusion of
clp/Defs.h
,clp/GrepCore.hpp
, andclp/Query.hpp
correctly replaces the removedclp_search
headers, supporting the migration to the core clp implementations.
205-205
: Type updates are correct and consistent.The changes from
Query*
toclp::Query*
properly reflect the migration from the removedclp_s::search::clp_search::Query
toclp::Query
.Also applies to: 278-278
861-873
: API update correctly implements the new GrepCore interface.The call to
clp::GrepCore::process_raw_query
properly includes the new required parameters:
dummy_timestamp
parameters for timestamp range (lines 868-869)m_placeholder_lexer
for lexer processing (line 871)- Dereferenced dictionary arguments (
*m_log_dict
,*m_var_dict
) matching the new reference-based API- Additional boolean parameter for processing mode (line 872)
components/core/src/clp_s/CMakeLists.txt (1)
20-96
: Build system updates correctly support the namespace migration.The addition of core clp source files and the
log_surgeon::log_surgeon
dependency properly supports the removal of legacyclp_s::search::clp_search
components and integration with the new core implementations.Also applies to: 116-116
components/core/src/clp_s/search/QueryRunner.hpp (3)
17-17
: Header updates support the new dependencies correctly.The addition of
log_surgeon/Lexer.hpp
and update toclp/Query.hpp
properly support the new lexer member and migrated query types.Also applies to: 20-20
133-133
: Type updates maintain API consistency.The changes from
Query
toclp::Query
in member variables and method signatures correctly reflect the namespace migration while maintaining the same interfaces.Also applies to: 135-135, 249-249
155-155
: Lexer member addition supports the new API requirements.The
log_surgeon::lexers::ByteLexer m_placeholder_lexer
member properly supports the updatedclp::GrepCore::process_raw_query
interface that requires a lexer parameter.components/core/src/clp_s/ColumnWriter.cpp (3)
3-9
: Updated includes support type safety and API migration.The new includes properly support the strongly typed dictionary IDs (
clp/Defs.h
) and the migrated encoding functionality (clp/EncodedVariableInterpreter.hpp
).
60-68
: Encoding API migration implemented correctly.The switch from
VariableEncoder::encode_and_add_to_dictionary
toclp::EncodedVariableInterpreter::encode_and_add_to_dictionary
properly reflects the removal of the legacy encoder. The addition ofm_temp_var_dict_ids
parameter supports the new interface requirements.
69-69
: Type safety improvements with strongly typed IDs.The changes from generic
uint64_t
toclp::logtype_dictionary_id_t
andclp::variable_dictionary_id_t
provide better type safety and make the code more self-documenting. Thesizeof
calculations are correctly updated to match the new types.Also applies to: 86-89, 93-94
// Types | ||
using encoded_log_dict_id_t = uint64_t; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider hoisting the encoded_log_dict_id_t
alias to a common header
encoded_log_dict_id_t
is introduced here as a class-scoped alias, but the same type is likely needed by multiple compilation units (e.g., readers and tests). Defining it in a shared header (perhaps Defs.h
, alongside the other strong typedefs) would avoid repetitive aliases and ensure a single source of truth.
🤖 Prompt for AI Agents
In components/core/src/clp_s/ColumnWriter.hpp around lines 120 to 122, the type
alias encoded_log_dict_id_t is currently defined locally but should be moved to
a common shared header file such as Defs.h. This change will allow multiple
compilation units like readers and tests to use the same alias consistently. To
fix, remove the alias from ColumnWriter.hpp and add it to the shared header
alongside other typedefs, ensuring a single source of truth for this type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is currently already included in all places this type needs to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gibber9809, thank you for the clarification! You're right - if ColumnWriter.hpp
is already included in all the places where encoded_log_dict_id_t
is needed, then the current placement as a class-scoped alias is appropriate and avoids unnecessary coupling. I appreciate you providing this context about the project's include structure.
✏️ Learnings added
Learnt from: gibber9809
PR: y-scope/clp#1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.720Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:85-103
Timestamp: 2024-10-14T03:42:53.145Z
Learning: The function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous namespace in `test-ffi_KeyValuePairLogEvent.cpp`.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Note that for DictionaryEntry.cpp/hpp, since most parts of code are copied from clp's DictionaryEntry classes, I only reviewed the difference introduced by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring to @haiqi96's review.
Description
This PR is the third in a series of PRs (see #1143 for the second) that is attempting to bring the core CLP parsing and search code to a state where it can be shared by both clp and clp-s. The full end to end integration can be seen here.
This PR switches out the core encoding and search code for log-text in clp-s to use the clp implementations. Since the previous two PRs did most of the heavy lifting to make this possible the bulk of this change is comprised of:
clp_s/DictionaryEntry.{cpp,hpp}
) act like clp's log-type dictionary entry (inclp/LogTypeDictionaryEntry.{cpp,hpp}
). This makes clp-s' log-type dictionary entry follow the interface defined by the concept that will be added in a follow-up PR.There is also a small change in the clp parsing utilities to eliminate a performance issue spotted during profiling.
Of these the most complex part is probably the first point, so it might be useful to look at the concept from the final prototype implementation as well as compare the clp-s logtype-dictionary entry implementation with the clp version.
Overall this change leads to an extremely marginal improvement in compression ratio, and compression and decompression performance that ranges from about even to somewhat significantly better across the open source datasets.
I have not yet benchmarked search performance as thoroughly, but ad-hoc tests didn't reveal any issues.
Compression ratio results:
Compression speed results:
Decompression speed results show a similar trend. Overall performance seems better (though before the change to the clp parsing utilities performance was worse in most cases).
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores
Style
Documentation