Conversation
Standardize hash function calls to use `_ptr_fct` for generic types and `_seed_hash_ptr_fct` for bloom filters, improving consistency. Update hash-fct.H and hash-fct.C to replace `dft_hash_fct` and `snd_hash_fct` with `SuperFastHash` and `murmur3hash` respectively, ensuring all hash functions accept pointer/length for raw data or direct object references for trivial types. Refactor hash-fct.C to use static FORCE_INLINE helpers and improve readability. Update various data structure headers (al-matrix.H, bloom-filter.H, tpl_dynLhash.H, tpl_dynMapOhash.H, tpl_dynSetHash.H, tpl_lhash.H, tpl_linHash.H, tpl_odhash.H, tpl_olhash.H) to reflect these new function naming conventions. Add new tests for `PairHashTest` and `MapHashAdaptorTest` in hash_fct_test.cc to verify correctness of hash combiners. Enhance hash_statistical_test.cc to include latency metrics and improved output format for hash performance and quality analysis. Exclude 'hash_statistical_test' from CI coverage and sanitizer jobs due to its computational intensity and the overhead of instrumentation. Overall, this commit simplifies the hash function API, improves test coverage, and refines CI configurations.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds many new data structures (in‑memory and persistent B‑Tree/B+Tree and map families), Cuckoo/Quotient filters (and compact variants), a large hash API refactor with thread‑safe jsw init, extensive examples and tests, CI test exclusions, CMake header exports, and a revised .clang‑format. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as rgba(30,144,255,0.5)
participant FileTree as rgba(34,139,34,0.5)
participant WAL as rgba(255,165,0,0.5)
participant FS as rgba(128,0,128,0.5)
participant Recovery as rgba(220,20,60,0.5)
Client->>FileTree: insert(key/value)
FileTree->>WAL: append WAL record
WAL->>FS: write & fsync
FileTree->>FS: update main file / checkpoint
FS-->>FileTree: fsync ack
Note over FileTree,FS: crash or reopen
FileTree->>Recovery: detect WAL/journal
Recovery->>WAL: read WAL records
Recovery->>FS: read pages / journal image
Recovery->>FileTree: replay WAL / restore pages
FileTree-->>Client: opened / consistent state
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Pull request overview
This PR broadens Aleph-w’s container and hashing ecosystem by adding page-oriented B-Tree/B+Tree documentation and tests, introducing new probabilistic filters (Cuckoo + compact variants, compact quotient filter), and modernizing hash-function plumbing (including new 64-bit hash implementations and pointer-based default hash adapters).
Changes:
- Added documentation, examples, and tests for the B-Tree/B+Tree “family” including persistent
File_*variants. - Introduced
Cuckoo_Filter,Compact_Cuckoo_Filter, andCompact_Quotient_Filterplus expanded hashing implementations (xxHash64/wyHash/SipHash) and statistical hash tests. - Switched several hash-table defaults to pointer-based hash adapters and refined Bloom filter API/annotations.
Reviewed changes
Copilot reviewed 46 out of 54 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tpl_dynSetHash.H | Updates default hasher to pointer-based adapter. |
| tpl_dynMapOhash.H | Updates default first/second hashers to pointer-based adapters. |
| tpl_dynLhash.H | Updates default hasher to pointer-based adapter. |
| hash-fct.C | Adds modern hashes + seedable JSW init; refactors helpers. |
| docs/b_tree_family.md | New docs outlining B-Tree/B+Tree and file-backed variants. |
| cuckoo-filter.H | New Cuckoo filter implementation (probabilistic membership with deletion). |
| compact-quotient-filter.H | New bit-packed quotient filter implementation. |
| compact-cuckoo-filter.H | New bit-packed cuckoo filter implementation. |
| bloom-filter.H | Refines API/docs; changes default seeded hash adapter; adds [[nodiscard]]. |
| al-matrix.H | Updates pair hash defaults to pointer-based adapters. |
| Tests/quotient-filter.cc | Adds gtest coverage for Quotient_Filter. |
| Tests/hash_statistical_test.cc | Adds a heavy statistical hash-quality/perf test. |
| Tests/hash_fct_test.cc | Extends hash tests for xxHash64/wyHash/SipHash + init changes. |
| Tests/file_bplus_map_test.cc | Adds tests for persistent B+ map features and iterators. |
| Tests/file_b_map_test.cc | Adds tests for persistent B map features. |
| Tests/cuckoo-filter.cc | Adds gtest coverage for Cuckoo_Filter. |
| Tests/compact-cuckoo-filter.cc | Adds gtest coverage for Compact_Cuckoo_Filter. |
| Tests/bplus_tree_test.cc | Adds gtest coverage for BPlus_Tree. |
| Tests/b_tree_test.cc | Adds gtest coverage for B_Tree. |
| Tests/CMakeLists.txt | Registers new test executables. |
| README.md | Documents new multiway trees + probabilistic filters. |
| README.es.md | Updates Spanish README probabilistic section. |
| Examples/test-odhash.cc | Fixes example hashing to use pair hash combiners; silences unused results. |
| Examples/quotient_filter_example.C | New quotient filter example program. |
| Examples/file_bplus_tree_example.cc | New persistent B+ tree example program. |
| Examples/file_bplus_map_example.cc | New persistent B+ map example program. |
| Examples/file_b_tree_example.cc | New persistent B-tree example program. |
| Examples/file_b_map_example.cc | New persistent B-map example program. |
| Examples/cuckoo_filter_example.cc | New cuckoo filter example program. |
| Examples/compact_quotient_filter_example.C | New compact quotient filter example program. |
| Examples/compact_cuckoo_filter_example.C | New compact cuckoo filter example program. |
| Examples/bplus_tree_example.cc | New in-memory B+ tree example program. |
| Examples/b_tree_example.cc | New in-memory B-tree example program. |
| CMakeLists.txt | Adds new headers to build/header list. |
| .github/workflows/ci.yml | Excludes costly hash_statistical_test from CI/asan/tsan/coverage runs. |
| .clang-format | Switches base style to GNU and adds custom brace/spacing rules. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e092593f08
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
bloom-filter.H (1)
503-510:⚠️ Potential issue | 🟠 MajorMust fix: don't turn
size()into a saturation-based estimate inoperator&=.
get_n()/size()are documented as insertion counters. After this assignment they instead reportexpected_size(get_x()), which can diverge from actual insertions and even jump tostd::numeric_limits<size_t>::max()when the bitset saturates. This change extends that contract break to&=as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bloom-filter.H` around lines 503 - 510, operator&= currently overwrites the insertion counter by assigning num_ins = expected_size(), which turns size()/get_n() into a saturation-based estimate; revert this by not changing the insertion counter during intersection: after performing bits &= f.bits just leave num_ins unchanged (or, if you must reconcile counters, compute a safe conservative value without calling expected_size(), e.g. keep the smaller of the two counters). Update the operator&= implementation (function operator&=, bits &= f.bits, and num_ins handling) to preserve the documented insertion-count semantics of get_n()/size().hash-fct.H (2)
742-787:⚠️ Potential issue | 🟠 MajorHash
std::stringbydata()/size(), not through the C-string overloads.These overloads route
std::stringthrough the NUL-terminatedconst char *versions, so values likestd::string{"a\0b", 3}and"a"hash identically. Thestd::stringoverloads should call the length-aware byte-sequence variants, like the Murmur/xxHash/wyhash/SipHash paths already do in this file.♻️ Fix pattern
inline size_t add_hash(const std::string & key) noexcept { - return add_hash(key.c_str()); + return add_hash(key.data(), key.size()); } inline size_t jsw_hash(const std::string & key) noexcept { - return jsw_hash(key.c_str()); + return jsw_hash(key.data(), key.size()); }Apply the same change to the other classical
std::stringoverloads in this block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.H` around lines 742 - 787, The std::string overloads currently forward to the NUL-terminated const char* variants, causing strings containing embedded NULs to hash incorrectly; update each overload (add_hash, xor_hash, rot_hash, djb_hash, sax_hash, fnv_hash, oat_hash, jsw_hash, elf_hash) to call the length-aware byte-sequence variants (pass key.data() and key.size()) instead of key.c_str(); apply the same change to every std::string overload in this block so hashing uses the full byte length.
331-347:⚠️ Potential issue | 🟠 MajorReplace
std::is_trivially_copyable_v<Key>withstd::has_unique_object_representations_v<Key>in all generic hash function overloads.Raw-byte hashing requires that equal values have identical object representations.
std::is_trivially_copyable_v<Key>only guarantees thatmemcpypreserves value, not that padding bytes are identical across equivalent objects. Types with padding can hash differently despite comparing equal, breaking hash table lookups. Usestd::has_unique_object_representations_v<Key>instead, which enforces the necessary guarantee that two objects with the same value have the same byte representation.Apply this constraint to all affected functions:
murmur3hash<Key>(line 332)add_hash<Key>,xor_hash<Key>,rot_hash<Key>,djb_hash<Key>and related overloads (lines 508–543)dft_hash_fct<Key>and seeded variants (lines 819–872; replace the trivially_copyable part of the compound constraint)Example fix pattern
- template <typename Key> - requires std::is_trivially_copyable_v<Key> + template <typename Key> + requires std::has_unique_object_representations_v<Key>For compound constraints like
dft_hash_fct, update only the trivially_copyable part:- requires (std::is_trivially_copyable_v<Key> and + requires (std::has_unique_object_representations_v<Key> and (not std::is_pointer_v<Key>) and (not std::is_array_v<Key>))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.H` around lines 331 - 347, Replace uses of std::is_trivially_copyable_v<Key> with std::has_unique_object_representations_v<Key> in all generic raw-byte hash overloads so hashing is performed only for types with unique object representations; specifically update the constraint on murmur3hash<Key> (function murmur3hash), and do the same for add_hash<Key>, xor_hash<Key>, rot_hash<Key>, djb_hash<Key> and their overloads, and for dft_hash_fct<Key> and its seeded variants (only replace the trivially_copyable part of any compound constraint). Ensure every function template that currently requires std::is_trivially_copyable_v<Key> instead requires std::has_unique_object_representations_v<Key> so memcpy-based hashing is only enabled for types with identical byte representations for equal values.
🧹 Nitpick comments (11)
compact-quotient-filter.H (1)
391-409: Same DRY issue:from_capacityandestimateduplicate parameter calculation.Same refactoring opportunity as in
quotient-filter.H.Also applies to: 568-583
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compact-quotient-filter.H` around lines 391 - 409, from_capacity and estimate duplicate the logic that computes q_v and r_v from expected_n and fp_rate; extract that common calculation into a single helper (e.g., a private static method on Compact_Quotient_Filter such as compute_qr_from or make_qr) that takes (size_t expected_n, double fp_rate) and returns the (q_v, r_v) pair, then call that helper from Compact_Quotient_Filter::from_capacity and from the estimate implementation (and any other sites noted) to remove the duplication and ensure consistent bounds handling (including the q_v + r_v > 64 adjustment and input validation).tpl_file_bplus_map.H (1)
188-196: Consider usingstd::transformor range-based approach forto_pairs.The manual loop with index is correct but verbose. A range-based approach would be more idiomatic per coding guidelines.
Suggested refactor
[[nodiscard]] static Array<std::pair<Key, Value>> to_pairs(const Array<Record> & records) { Array<std::pair<Key, Value>> out; out.reserve(records.size() == 0 ? 1 : records.size()); - for (size_t i = 0; i < records.size(); ++i) - out.append(to_pair(records[i])); + for (const auto & record : records) + out.append(to_pair(record)); return out; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_bplus_map.H` around lines 188 - 196, The to_pairs function uses an index-based loop which is verbose; refactor to a range-based loop or std::transform to be more idiomatic: iterate over records with "for (const Record &rec : records)" and append to out via out.append(to_pair(rec)) (preserve the reserve call), or use std::transform with std::back_inserter(out) and the to_pair function to populate out; keep the return type (Array<std::pair<Key, Value>>) and existing reserve logic intact and remove the manual index arithmetic.quotient-filter.H (2)
586-613:gotousage is acceptable here but consider structured alternative.The
goto foundpattern (lines 589, 597) is a common C idiom for breaking out of nested loops/conditions. While acceptable, a helper lambda or early return could improve readability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@quotient-filter.H` around lines 586 - 613, The code uses a goto ("goto found") to jump out of the loop and run a block that handles deletion logic; replace this with a structured alternative by extracting the post-find logic into a helper function or lambda (e.g., a local function delete_at(pos, fr, fq) or lambda named handle_found) and invoke it immediately when get_rem(pos) == fr instead of jumping; keep the loop logic using get_rem, get_cont, incr, has_element to locate the element and then call the helper which performs the same operations (compute was_run_start via get_cont, compute nxt via incr, use has_element/get_cont to set_cont or set_occ, call shift_elements_left, decrement num_elems_) and return the boolean result directly to avoid goto while preserving the same behavior.
507-525: DRY:from_capacityandestimateshare identical parameter calculation logic.Both methods compute
qandridentically. Consider havingfrom_capacitycallestimate:Suggested refactor
static Quotient_Filter from_capacity(size_t expected_n, double fp_rate, uint32_t seed = 0x5F3759DF) { - ah_domain_error_if(expected_n == 0) - << "Quotient_Filter::from_capacity: expected_n must be > 0"; - ah_domain_error_if(fp_rate <= 0.0 or fp_rate >= 1.0) - << "Quotient_Filter::from_capacity: fp_rate must be in (0, 1) (got " - << fp_rate << ")"; - - const double slots_needed = static_cast<double>(expected_n) / 0.75; - auto q_v = static_cast<uint8_t>( - std::max(1.0, std::ceil(std::log2(slots_needed)))); - auto r_v = static_cast<uint8_t>( - std::max(1.0, std::ceil(-std::log2(fp_rate)))); - if (q_v + r_v > 64) - q_v = static_cast<uint8_t>(64 - r_v); - return Quotient_Filter(q_v, r_v, seed); + auto [q_v, r_v] = estimate(expected_n, fp_rate); + return Quotient_Filter(q_v, r_v, seed); }Also applies to: 684-699
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@quotient-filter.H` around lines 507 - 525, The q/r computation in Quotient_Filter::from_capacity duplicates logic in Quotient_Filter::estimate; refactor by moving the shared calculation (slots_needed, q_v, r_v and the q_v + r_v > 64 adjustment) into a single helper (e.g., compute_qr or make estimate the canonical calculator) and have from_capacity call that helper (or call estimate) to obtain q and r before constructing the Quotient_Filter, ensuring the same seed handling and the 64-bit cap logic is preserved.tpl_file_b_map.H (1)
491-502:insert_or_assignuses non-atomic remove+insert pattern.The current implementation removes the existing record, then inserts the new one. If the second
insert()fails (line 499-500), the original value is already gone. While theah_runtime_error_unlesswill throw, this leaves the map in a state where the key was removed but not replaced.Consider whether this is acceptable or if a different approach (e.g., in-place update if the tree supports it) would be safer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 491 - 502, The current insert_or_assign removes the old record then inserts the new one, which can leave the map empty if tree_.insert(record) fails; update insert_or_assign to be atomic: first probe with make_probe(key) and check for an existing record via tree_.find(probe) (or an equivalent lookup), then if the tree supports an in-place replace/update use that (e.g., tree_.replace/probe/update with the new record); if no in-place API exists, read and save the existing record, attempt tree_.insert(record) only if no existing record, otherwise attempt to remove+insert but on any insert failure reinsert the saved original_record to restore state and rethrow the error; reference functions: insert_or_assign, make_probe, make_record, tree_.find/tree_.replace/tree_.insert/tree_.remove to locate and implement this logic.Examples/file_b_tree_example.cc (1)
71-77: Should fix: make temp-file cleanup exception-safe with RAII.Cleanup is currently duplicated and skipped if an exception occurs before the final removal block.
♻️ Suggested refactor (single cleanup guard)
namespace { namespace fs = std::filesystem; + struct Cleanup + { + fs::path path; + ~Cleanup() + { + std::error_code ec; + fs::remove(path, ec); + fs::remove(path.string() + ".wal", ec); + fs::remove(path.string() + ".wal.tmp", ec); + fs::remove(path.string() + ".journal", ec); + fs::remove(path.string() + ".journal.tmp", ec); + fs::remove(path.string() + ".lock", ec); + } + }; @@ int main() { const fs::path file_path = fs::temp_directory_path() / "aleph_file_b_tree_example.idx"; + Cleanup cleanup{file_path}; try { - std::error_code ec; - fs::remove(file_path, ec); - fs::remove(file_path.string() + ".wal", ec); - fs::remove(file_path.string() + ".wal.tmp", ec); - fs::remove(file_path.string() + ".journal", ec); - fs::remove(file_path.string() + ".journal.tmp", ec); - fs::remove(file_path.string() + ".lock", ec); @@ - fs::remove(file_path, ec); - fs::remove(file_path.string() + ".wal", ec); - fs::remove(file_path.string() + ".wal.tmp", ec); - fs::remove(file_path.string() + ".journal", ec); - fs::remove(file_path.string() + ".journal.tmp", ec); - fs::remove(file_path.string() + ".lock", ec); }As per coding guidelines "All resources (memory, file handles, locks) must follow RAII principles."
Also applies to: 125-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Examples/file_b_tree_example.cc` around lines 71 - 77, Replace the repeated manual cleanup calls that use file_path and fs::remove with a small RAII guard: implement a local class/struct (e.g., TempFileCleaner) that stores the std::filesystem::path file_path and in its destructor attempts fs::remove for file_path, file_path + ".wal", ".wal.tmp", ".journal", ".journal.tmp", and ".lock", swallowing or logging errors via std::error_code to avoid throwing; instantiate this guard at the start of the scope that currently performs cleanup so the files are removed automatically on all exits (normal return or exception), and remove the duplicated fs::remove blocks (including the other similar block that currently appears later) in favor of the RAII guard.hash-fct.C (2)
252-270: Missing<cstring>include formemcpy.The
jen_hashfunction usesmemcpyat line 263, but<cstring>is not explicitly included. While this may work due to transitive includes, it's better to explicitly include required headers.♻️ Suggested fix - add to includes section
# include <ctime> +# include <cstring> # include <gsl/gsl_rng.h> # include <stdexcept> # include "hash-fct.H"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.C` around lines 252 - 270, The file is missing the header that declares memcpy used in jen_hash; add an explicit `#include` <cstring> (preferably alongside the other includes at the top of the file) so memcpy is properly declared for the jen_hash(const void*, size_t, unsigned) function and avoid relying on transitive includes.
190-201: GSL RNG resource management could use RAII.The
init_jswfunction manually manages the GSL RNG lifecycle withgsl_rng_allocandgsl_rng_free. Per coding guidelines, "All resources must follow RAII principles." Consider using a unique_ptr with a custom deleter.♻️ Suggested RAII approach
void init_jsw(std::uint32_t seed) noexcept { - gsl_rng * r = gsl_rng_alloc (gsl_rng_mt19937); - gsl_rng_set(r, seed % gsl_rng_max(r)); - - for (int i = 0; i < 256; ++i) - tab[i] = gsl_rng_get(r); - - gsl_rng_free(r); + auto deleter = [](gsl_rng * p) { gsl_rng_free(p); }; + std::unique_ptr<gsl_rng, decltype(deleter)> r( + gsl_rng_alloc(gsl_rng_mt19937), deleter); + gsl_rng_set(r.get(), seed % gsl_rng_max(r.get())); + + for (int i = 0; i < 256; ++i) + tab[i] = gsl_rng_get(r.get()); init = true; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.C` around lines 190 - 201, The init_jsw function currently allocates and frees a gsl_rng* manually; replace this manual lifecycle with RAII by wrapping the gsl_rng pointer in a std::unique_ptr with a custom deleter that calls gsl_rng_free, then use that unique_ptr (e.g., in init_jsw) to set the seed, call gsl_rng_get in the loop, and let the deleter handle cleanup; keep the same seed logic (seed % gsl_rng_max(r)) and ensure init remains set to true at the end.Tests/compact-quotient-filter.cc (1)
541-549: Consider using Aleph container for consistency.Per coding guidelines,
DynSetTree<T>should be preferred overstd::set<T>. However, in test code where the purpose is to track inserted values independently of the filter under test,std::setis acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/compact-quotient-filter.cc` around lines 541 - 549, Replace the test-local container type std::set<int> named inserted with the project's preferred container DynSetTree<int> to follow coding guidelines: update the declaration of inserted to use DynSetTree<int> (and add the appropriate include for DynSetTree), keep all uses of inserted (inserts and lookups) unchanged so qf.insert, dist, and rng calls remain the same, and ensure the test builds by including the header that defines DynSetTree.Tests/file_bplus_map_test.cc (1)
1-8: Missing full license header.This file has a minimal header compared to other test files in this PR which include the full MIT license text. For consistency, consider adding the complete license header.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/file_bplus_map_test.cc` around lines 1 - 8, The file Tests/file_bplus_map_test.cc currently has only a short header comment; update it to include the full MIT license header used across other test files to maintain consistency. Replace the minimal comment block at the top of file_bplus_map_test.cc with the project’s standard full license text (the complete MIT license block used in other tests), ensuring the header formatting and author/project lines match the other test files in the PR.Tests/quotient-filter.cc (1)
42-42: Unused include.The
<set>header is included butstd::setis not used anywhere in this test file.♻️ Suggested fix
`#include` <cstddef> `#include` <string> -#include <set> `#include` <random>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/quotient-filter.cc` at line 42, Remove the unused `#include` <set> from Tests/quotient-filter.cc; the header is not used anywhere in this file so delete the include line to clean up unused dependencies and avoid warnings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.clang-format:
- Around line 27-56: The .clang-format contains a duplicate settings block that
conflicts with later definitions; remove or consolidate the earlier block that
defines SpaceBeforeParens, SpaceAfterCStyleCast, BreakBeforeBinaryOperators,
BraceWrapping, ContinuationIndentWidth and IndentCaseLabels so each setting
appears only once and keep the intended canonical values (ensure the retained
definition matches the project's desired style).
In `@compact-cuckoo-filter.H`:
- Around line 264-298: In insert(const T & item) the kick loop mutates buckets
via buckets(curr_idx).swap_random(curr_fp, rng_) before knowing if insertion
will succeed, causing silent evictions on final failure; change insert to avoid
in-place mutation until success by simulating or recording each displacement
(store a vector of {index, evicted_fp} or copies of affected bucket state) while
performing swap logic, only apply the recorded mutations to buckets and
increment num_items when an insert succeeds, and if the loop reaches max_kicks
revert/no-op using the recorded data (or discard simulated changes) so
swap_random, curr_fp, curr_idx, buckets(...), num_items and rng_ are not left in
a mutated state on failure.
- Around line 213-223: The constructor Compact_Cuckoo_Filter(size_t capacity)
currently sizes buckets for exactly 'capacity' items; change sizing to account
for a target load factor (e.g., 0.9 or 0.95) so the filter has slack for cuckoo
insertions: compute an adjusted_capacity = ceil(capacity / TARGET_LOAD_FACTOR)
(declare TARGET_LOAD_FACTOR as a constexpr or static const), then compute
needed_buckets = ceil(adjusted_capacity / EntriesPerBucket), round that up with
next_power_of_2, and set num_buckets and bucket_mask as before (keeping the if
(num_buckets == 0) guard).
In `@cuckoo-filter.H`:
- Around line 225-259: The insert() method mutates buckets via
buckets(curr_idx).swap_random(...) before knowing it will succeed, so if the
kick loop exits after max_kicks the filter has been altered (a fingerprint
removed) while num_items is unchanged; to fix, make the kick sequence
rollback-safe by either (A) simulating the kicks on a small local buffer and
only applying them to the actual buckets when an empty slot is found, or (B)
snapshotting the original fingerprint and bucket slot(s) before the first swap
and restoring them if the loop fails; specifically update the logic around
buckets(curr_idx).swap_random, curr_fp, alt_index and rng_ so that no bucket is
permanently changed unless the function will return true, and ensure num_items
is incremented only after a successful insertion.
- Around line 183-193: The constructor Cuckoo_Filter currently computes
needed_buckets with ceil(capacity / EntriesPerBucket) which assumes 100% load
and can cause early insert failures; change the calculation in Cuckoo_Filter to
reserve spare capacity by dividing by a target_load_factor (e.g., 0.9 or 0.95)
before calling next_power_of_2 so needed_buckets = ceil(capacity /
(EntriesPerBucket * target_load_factor)), then proceed with num_buckets =
next_power_of_2(needed_buckets), ensure num_buckets >= 1 and recompute
bucket_mask; reference the constructor Cuckoo_Filter, EntriesPerBucket,
next_power_of_2, num_buckets and bucket_mask when making the change and add a
short comment about the chosen load factor.
In `@Examples/compact_quotient_filter_example.C`:
- Around line 2-12: Replace the abbreviated copyright header at the top of
compact_quotient_filter_example.C with the full MIT license header used in the
other new files (e.g., quotient-filter.H and tpl_file_bplus_map.H), keeping the
original author and year range (Leandro Rabindranath Leon, 2002-2026) and
ensuring the complete MIT permission grant and disclaimer text is present
verbatim so the file header matches the project’s standard license block.
In `@Tests/b_tree_test.cc`:
- Around line 106-254: Add two focused tests: one that exercises Tree::clear()
and one that triggers the single-key/root-collapse path via Tree::remove(). For
clear(): build a non-empty Tree (e.g. insert a few values or use initializer),
call tree.clear(), then assert tree.empty()/is_empty(), size()==0, height()==0,
min_key()/max_key() are std::nullopt, lower_bound/upper_bound return
std::nullopt for a sample value, and tree.verify() is true. For the
root-collapse: create a Tree with a single element (e.g. Tree t = {X}), call
EXPECT_TRUE(t.remove(X)), then assert the same postconditions as above
(empty/is_empty, size 0, height 0, min/max none, verify true) to ensure the
one-key removal collapses the root correctly; reference Tree::clear,
Tree::remove, Tree::height, Tree::min_key, Tree::max_key, Tree::is_empty, and
Tree::verify when adding the new TEST_F cases.
In `@Tests/bplus_tree_test.cc`:
- Around line 118-297: Add tests that explicitly exercise the new clear() API
and the single-key leaf-chain reset path: create a Tree, insert multiple keys,
call tree.clear(), then assert tree.empty(), size()==0, height()==0,
min_key()/max_key()/lower_bound()/upper_bound() return no value, to_vector(tree)
is empty, and tree.verify(); also add a test that inserts a single key, removes
that key (using Tree::remove), and then asserts the same post-removal state
(empty/root reset and leaf-chain cleared) to ensure removal of the last element
resets both root and leaf links. Use the existing TEST_F fixture pattern and
helper functions like to_vector(), expect_optional_eq(), and tree.verify() to
validate state.
In `@Tests/cuckoo-filter.cc`:
- Around line 21-34: The tests currently assert exact negatives (e.g., in TEST
CuckooFilterTest.BasicOperations using Cuckoo_Filter::contains) which is invalid
for a probabilistic filter; replace single EXPECT_FALSE checks with a
deterministic false-positive budget test: query a fixed large set of negative
keys (use a reproducible sequence/seed), count how many negatives return true
via Cuckoo_Filter::contains, and assert that the count is <= an allowed
threshold (derived from sample_size and the filter's expected false-positive
rate); update all tests referencing contains negatives (including
CuckooFilterTest.BasicOperations and other cases around lines noted) to perform
this counted/asserted check instead of exact EXPECT_FALSE while still keeping
exact positive assertions for inserted items and verifying size() as before.
- Around line 151-163: The test wrongly assumes every index < inserted was
actually inserted because `inserted` is only a count; replace the implicit
assumption by recording which keys succeeded and assert on those: when calling
`filter.insert(i)` in the insertion loop, push `i` into a vector (e.g.,
`inserted_keys`) for successful inserts, use `inserted_keys.size()` if you need
the count, and then iterate over `inserted_keys` (not `i < inserted`) in the
verification loop calling `filter.contains(key)` and asserting true for each
recorded key; update any debug message to include the load factor `lf` and the
specific `key`.
In `@Tests/hash_statistical_test.cc`:
- Around line 259-282: The avalanche_ratio currently divides changed_bits by a
dynamic comparisons value computed from effective_output_bits(candidate,
samples), which can undercount bits that only appear when inputs are mutated;
change the denominator to use a fixed output width instead (e.g., use
candidate.output_bit_width() or a constant 64-bit width if that accessor doesn't
exist) so comparisons = samples.size() * flip_count_per_sample *
fixed_output_bits (or compute per-sample flip count as in the loop), and keep
the numerator accumulation in changed_bits unchanged; update avalanche_ratio to
compute output_bits from the fixed width and use that for comparisons so the
avalanche score is normalized against a stable bit width rather than the
observed bits from effective_output_bits().
- Around line 314-315: The test asserts on the optimizer guard variable `sink`
using `EXPECT_NE(sink, 0u)`, which can legitimately be zero and causes flaky
failures; remove the `EXPECT_NE(sink, 0u)` assertion (i.e., delete the
`EXPECT_NE(sink, 0u);` line after the throughput loop) so the test no longer
depends on the optimizer-guard value `sink`.
In `@tpl_b_tree.H`:
- Around line 623-645: The mutable comparator accessors key_comp() and
get_compare() allow external mutation of the comparator and can break tree
invariants; remove the non-const overloads (Compare& key_comp() noexcept and
Compare& get_compare() noexcept) from the class (leaving only the [[nodiscard]]
const Compare& key_comp() const noexcept and const Compare& get_compare() const
noexcept) so callers cannot modify cmp_ after construction, and update any
internal or test code that relied on the mutable overload to use the const
accessors or to accept a comparator at construction instead.
- Around line 606-615: The copy-assignment operator for Gen_B_Tree assigns root_
and size_ before cmp_, so if Compare::operator= throws you'll end up with nodes
ordered for other but still using the old comparator; change to an
exception-safe pattern by first making copies/temporaries (e.g.,
clone_node(other.root_.get()) into a local unique_ptr and copy-construct or
copy-assign a local Compare copy or perform copy-construct of a temporary
Gen_B_Tree), then commit by swapping into *this (or assign cmp_ first then swap
root_/size_), ensuring the final step updates all members only after comparator
copy succeeded; reference operator=, root_, size_, cmp_, and clone_node in your
change.
In `@tpl_bplus_tree.H`:
- Around line 817-826: The assignment operator currently assigns root_ and size_
before cmp_, so if Compare::operator= throws after cloning the tree the object
ends up with other’s structure but the old comparator; fix by using the
copy-and-swap idiom: construct a temporary Gen_BPlus_Tree tmp(other) (so tmp
obtains a safe copy using other’s comparator), then swap(*this, tmp) to
atomically replace root_, size_ and cmp_; ensure swap is noexcept (or implement
a member swap) and keep clone_node only used inside the temporary so no partial
state is left on *this if a throw occurs.
- Around line 690-700: The range constructor and related APIs
(Iterator::Iterator(const Gen_BPlus_Tree&, const Key&, const Key&), plus callers
like get_range_it() and range()) currently stream Key values into the
ah_invalid_argument_if(...) macro which forces operator<< on Key; remove the
streaming of first and last into the error macro and replace with a generic
message that does not reference first/last (e.g. "BPlus_Tree::Iterator():
invalid range") or conditionally include values only when Key is streamable;
update the ah_invalid_argument_if usage in Iterator and any similar checks in
get_range_it()/range() to avoid using operator<< on Key so non-streamable Key
types compile.
In `@tpl_file_bplus_map.H`:
- Around line 570-581: The current insert_or_assign (using
make_probe/make_record and calling tree_.remove(probe) then
tree_.insert(record)) is not atomic and can lose a key if the process crashes
between remove and insert; either (A) document this crash-recovery limitation on
File_BPlus_Map::insert_or_assign (explicitly state that File_BPlus_Tree lacks an
atomic replace and that remove+insert may lose keys on crash) or (B) implement
an atomic update/replace in File_BPlus_Tree (e.g., add a
replace_or_update(probe, record) or atomic_update(key, new_value) method) and
use that new method from insert_or_assign instead of separate remove/insert.
Ensure references to make_probe, make_record, tree_.remove, tree_.insert and
File_BPlus_Tree are updated accordingly and add tests or comments covering
crash-recovery semantics.
---
Outside diff comments:
In `@bloom-filter.H`:
- Around line 503-510: operator&= currently overwrites the insertion counter by
assigning num_ins = expected_size(), which turns size()/get_n() into a
saturation-based estimate; revert this by not changing the insertion counter
during intersection: after performing bits &= f.bits just leave num_ins
unchanged (or, if you must reconcile counters, compute a safe conservative value
without calling expected_size(), e.g. keep the smaller of the two counters).
Update the operator&= implementation (function operator&=, bits &= f.bits, and
num_ins handling) to preserve the documented insertion-count semantics of
get_n()/size().
In `@hash-fct.H`:
- Around line 742-787: The std::string overloads currently forward to the
NUL-terminated const char* variants, causing strings containing embedded NULs to
hash incorrectly; update each overload (add_hash, xor_hash, rot_hash, djb_hash,
sax_hash, fnv_hash, oat_hash, jsw_hash, elf_hash) to call the length-aware
byte-sequence variants (pass key.data() and key.size()) instead of key.c_str();
apply the same change to every std::string overload in this block so hashing
uses the full byte length.
- Around line 331-347: Replace uses of std::is_trivially_copyable_v<Key> with
std::has_unique_object_representations_v<Key> in all generic raw-byte hash
overloads so hashing is performed only for types with unique object
representations; specifically update the constraint on murmur3hash<Key>
(function murmur3hash), and do the same for add_hash<Key>, xor_hash<Key>,
rot_hash<Key>, djb_hash<Key> and their overloads, and for dft_hash_fct<Key> and
its seeded variants (only replace the trivially_copyable part of any compound
constraint). Ensure every function template that currently requires
std::is_trivially_copyable_v<Key> instead requires
std::has_unique_object_representations_v<Key> so memcpy-based hashing is only
enabled for types with identical byte representations for equal values.
---
Nitpick comments:
In `@compact-quotient-filter.H`:
- Around line 391-409: from_capacity and estimate duplicate the logic that
computes q_v and r_v from expected_n and fp_rate; extract that common
calculation into a single helper (e.g., a private static method on
Compact_Quotient_Filter such as compute_qr_from or make_qr) that takes (size_t
expected_n, double fp_rate) and returns the (q_v, r_v) pair, then call that
helper from Compact_Quotient_Filter::from_capacity and from the estimate
implementation (and any other sites noted) to remove the duplication and ensure
consistent bounds handling (including the q_v + r_v > 64 adjustment and input
validation).
In `@Examples/file_b_tree_example.cc`:
- Around line 71-77: Replace the repeated manual cleanup calls that use
file_path and fs::remove with a small RAII guard: implement a local class/struct
(e.g., TempFileCleaner) that stores the std::filesystem::path file_path and in
its destructor attempts fs::remove for file_path, file_path + ".wal",
".wal.tmp", ".journal", ".journal.tmp", and ".lock", swallowing or logging
errors via std::error_code to avoid throwing; instantiate this guard at the
start of the scope that currently performs cleanup so the files are removed
automatically on all exits (normal return or exception), and remove the
duplicated fs::remove blocks (including the other similar block that currently
appears later) in favor of the RAII guard.
In `@hash-fct.C`:
- Around line 252-270: The file is missing the header that declares memcpy used
in jen_hash; add an explicit `#include` <cstring> (preferably alongside the other
includes at the top of the file) so memcpy is properly declared for the
jen_hash(const void*, size_t, unsigned) function and avoid relying on transitive
includes.
- Around line 190-201: The init_jsw function currently allocates and frees a
gsl_rng* manually; replace this manual lifecycle with RAII by wrapping the
gsl_rng pointer in a std::unique_ptr with a custom deleter that calls
gsl_rng_free, then use that unique_ptr (e.g., in init_jsw) to set the seed, call
gsl_rng_get in the loop, and let the deleter handle cleanup; keep the same seed
logic (seed % gsl_rng_max(r)) and ensure init remains set to true at the end.
In `@quotient-filter.H`:
- Around line 586-613: The code uses a goto ("goto found") to jump out of the
loop and run a block that handles deletion logic; replace this with a structured
alternative by extracting the post-find logic into a helper function or lambda
(e.g., a local function delete_at(pos, fr, fq) or lambda named handle_found) and
invoke it immediately when get_rem(pos) == fr instead of jumping; keep the loop
logic using get_rem, get_cont, incr, has_element to locate the element and then
call the helper which performs the same operations (compute was_run_start via
get_cont, compute nxt via incr, use has_element/get_cont to set_cont or set_occ,
call shift_elements_left, decrement num_elems_) and return the boolean result
directly to avoid goto while preserving the same behavior.
- Around line 507-525: The q/r computation in Quotient_Filter::from_capacity
duplicates logic in Quotient_Filter::estimate; refactor by moving the shared
calculation (slots_needed, q_v, r_v and the q_v + r_v > 64 adjustment) into a
single helper (e.g., compute_qr or make estimate the canonical calculator) and
have from_capacity call that helper (or call estimate) to obtain q and r before
constructing the Quotient_Filter, ensuring the same seed handling and the 64-bit
cap logic is preserved.
In `@Tests/compact-quotient-filter.cc`:
- Around line 541-549: Replace the test-local container type std::set<int> named
inserted with the project's preferred container DynSetTree<int> to follow coding
guidelines: update the declaration of inserted to use DynSetTree<int> (and add
the appropriate include for DynSetTree), keep all uses of inserted (inserts and
lookups) unchanged so qf.insert, dist, and rng calls remain the same, and ensure
the test builds by including the header that defines DynSetTree.
In `@Tests/file_bplus_map_test.cc`:
- Around line 1-8: The file Tests/file_bplus_map_test.cc currently has only a
short header comment; update it to include the full MIT license header used
across other test files to maintain consistency. Replace the minimal comment
block at the top of file_bplus_map_test.cc with the project’s standard full
license text (the complete MIT license block used in other tests), ensuring the
header formatting and author/project lines match the other test files in the PR.
In `@Tests/quotient-filter.cc`:
- Line 42: Remove the unused `#include` <set> from Tests/quotient-filter.cc; the
header is not used anywhere in this file so delete the include line to clean up
unused dependencies and avoid warnings.
In `@tpl_file_b_map.H`:
- Around line 491-502: The current insert_or_assign removes the old record then
inserts the new one, which can leave the map empty if tree_.insert(record)
fails; update insert_or_assign to be atomic: first probe with make_probe(key)
and check for an existing record via tree_.find(probe) (or an equivalent
lookup), then if the tree supports an in-place replace/update use that (e.g.,
tree_.replace/probe/update with the new record); if no in-place API exists, read
and save the existing record, attempt tree_.insert(record) only if no existing
record, otherwise attempt to remove+insert but on any insert failure reinsert
the saved original_record to restore state and rethrow the error; reference
functions: insert_or_assign, make_probe, make_record,
tree_.find/tree_.replace/tree_.insert/tree_.remove to locate and implement this
logic.
In `@tpl_file_bplus_map.H`:
- Around line 188-196: The to_pairs function uses an index-based loop which is
verbose; refactor to a range-based loop or std::transform to be more idiomatic:
iterate over records with "for (const Record &rec : records)" and append to out
via out.append(to_pair(rec)) (preserve the reserve call), or use std::transform
with std::back_inserter(out) and the to_pair function to populate out; keep the
return type (Array<std::pair<Key, Value>>) and existing reserve logic intact and
remove the manual index arithmetic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 692b2502-6a9c-49dc-85f8-23829499db18
📒 Files selected for processing (54)
.clang-format.github/workflows/ci.ymlCMakeLists.txtExamples/b_tree_example.ccExamples/bplus_tree_example.ccExamples/compact_cuckoo_filter_example.CExamples/compact_quotient_filter_example.CExamples/cuckoo_filter_example.ccExamples/file_b_map_example.ccExamples/file_b_tree_example.ccExamples/file_bplus_map_example.ccExamples/file_bplus_tree_example.ccExamples/quotient_filter_example.CExamples/test-odhash.ccREADME.es.mdREADME.mdTests/CMakeLists.txtTests/b_tree_test.ccTests/bplus_tree_test.ccTests/compact-cuckoo-filter.ccTests/compact-quotient-filter.ccTests/cuckoo-filter.ccTests/file_b_map_test.ccTests/file_b_tree_test.ccTests/file_bplus_map_test.ccTests/file_bplus_tree_test.ccTests/hash_fct_test.ccTests/hash_statistical_test.ccTests/quotient-filter.ccal-matrix.Hbloom-filter.Hcompact-cuckoo-filter.Hcompact-quotient-filter.Hcuckoo-filter.Hdocs/b_tree_family.mdhash-fct.Chash-fct.Hquotient-filter.Htpl_b_tree.Htpl_bplus_tree.Htpl_dynLhash.Htpl_dynMapOhash.Htpl_dynSetHash.Htpl_file_b_map.Htpl_file_b_tree.Htpl_file_bplus_map.Htpl_file_bplus_tree.Htpl_lhash.Htpl_linHash.Htpl_odhash.Htpl_olhash.Htpl_paged_tree_durability.Htpl_paged_value_codec.Htpl_tree_snapshot.H
|
Summary
🤖 Prompt for AI Agents |
Refactor default hashing to use xxHash64 for improved performance and distribution. - Update `dft_hash_fct` and `snd_hash_fct` to use xxHash64. - Introduce `Aleph_Hash` concept for ADL-based user type hashing. - Adjust internal hash implementations (`jsw_hash`, `MurmurHash3`, `xxhash64_hash`, `wyhash_hash`, `siphash24_hash`) for consistency and performance. - Update `Compact_Cuckoo_Filter` and `Cuckoo_Filter` to leverage the new hashing approach, including stash mechanism for failed kicks. - Modify `.clang-format` to align with project-specific style, including indentation, brace wrapping, and line limits. - Update statistical hash tests to validate new hash functions and bit width collision scaling, improving test coverage and reliability. - Update examples for compact cuckoo and quotient filters to reflect new capabilities and better illustrate use cases. - Address minor formatting in `tpl_bplus_tree.H` to fix inconsistencies.
Refactor cuckoo and quotient filters for clarity and robustness, including test updates for better coverage. Improve B-tree move semantics and fix iterator range validation in File_BPlus_Tree. Adjust hash function overloads for std::string to use data()+size().
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Examples/set_structures_benchmark.cc (1)
715-737:⚠️ Potential issue | 🟡 MinorHelp text shows outdated default value.
The default changed to 2,000,000 on line 715, but the help text on line 737 still says
"default: 1000000". This inconsistency will confuse users.📝 Proposed fix
- << " -n, --count N Number of elements (default: 1000000)\n" + << " -n, --count N Number of elements (default: 2000000)\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Examples/set_structures_benchmark.cc` around lines 715 - 737, The help text is inconsistent with the actual default for n (variable n set to 2000000); update the usage/help message built in the argument parsing block (the cout block that prints "Set Data Structures Benchmark" and "Options:") so the displayed default for -n/--count matches n's actual default (2,000,000); locate the help string near the argument parsing code that references argv[0] and change "default: 1000000" to "default: 2000000" (or dynamically print n) so users see the correct default.
♻️ Duplicate comments (2)
tpl_b_tree.H (1)
626-648:⚠️ Potential issue | 🟠 MajorMust fix: keep the comparator immutable after construction.
Returning
Compare&lets callers change the ordering without rebuilding the tree. After that,contains(), bounds queries, inserts, and removals all operate against broken B-Tree invariants.Suggested change
- Compare& key_comp() noexcept { return cmp_; } - /** `@brief` Return the comparison object. * `@return` Constant reference to the comparison functor. * `@throws` Nothing. */ [[nodiscard]] const Compare& key_comp() const noexcept { return cmp_; } - - Compare& get_compare() noexcept { return key_comp(); } /** `@brief` Return the comparison object. * `@return` Constant reference to the comparison functor. * `@throws` Nothing. */ [[nodiscard]] const Compare& get_compare() const noexcept { return key_comp(); }Based on learnings: Verify balance invariants in tree data structures (BST, AVL, Red-Black, B-trees).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 626 - 648, The comparator must be immutable after construction: stop exposing non-const accessors that allow callers to modify cmp_; change the mutable overloads key_comp() and get_compare() to return const Compare& (or remove/delete the non-const overloads) and make the backing member cmp_ effectively const by initializing it in the constructor (e.g., declare cmp_ as const Compare or otherwise ensure it cannot be changed after construction) so all external access is read-only and ordering cannot be altered at runtime.Tests/hash_statistical_test.cc (1)
273-301:⚠️ Potential issue | 🟠 MajorShould fix: avalanche is still normalized against an observed output mask.
effective_output_bits(candidate, samples)only sees bits that happen to be set in the base corpus. If a flipped input activates a previously unseen high bit, Line 295 counts it but Line 296 never budgets for it, so the reported ratio is overstated. Use a fixed per-candidate output width instead of a sample-derived mask. Based on learnings, "Tests must verify algorithmic correctness, not just absence of crashes".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/hash_statistical_test.cc` around lines 273 - 301, The avalanche_ratio function currently calls effective_output_bits(candidate, samples) which derives an output mask from the sample set and can undercount the budget when a flipped input activates previously unseen high bits; replace that sample-derived output_bits with a fixed per-candidate output width and use that fixed value when incrementing comparisons. Concretely, remove the call to effective_output_bits(...), obtain a fixed width from the candidate (e.g. a candidate.output_bit_width() accessor or, if no accessor exists, the hash output size in bits such as sizeof(return_type_of(candidate.hash))*8), use that fixed value in place of output_bits when incrementing comparisons, and keep changed_bits += std::popcount(original ^ hashed) as-is so the ratio is normalized against the candidate's full output width. Ensure the change targets the avalanche_ratio function and updates comparisons and any related variable names accordingly.
🧹 Nitpick comments (2)
.clang-format (1)
1-3: Align the comments with the actual brace rules.Line 2 still says “K&R braces” and Line 15 says “GNU style”, but the active settings define a custom next-line/indented brace profile instead. Updating the comments will keep this file self-describing.
📝 Suggested comment cleanup
-# Based on project conventions: 2 spaces, K&R braces, prefer and/or/not +# Based on project conventions: 2 spaces, custom next-line braces, prefer and/or/not @@ -# Braces (GNU style: nueva línea e indentadas) +# Braces (custom next-line style with indented braces)Also applies to: 15-16
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.clang-format around lines 1 - 3, Update the top-of-file descriptive comments to match the active brace configuration instead of claiming "K&R braces" or "GNU style": change the Line 2 comment that currently says “K&R braces” and the comments around lines 15-16 that say “GNU style” to describe the actual next-line/indented brace profile used by the file; reference the .clang-format brace-related settings (the custom next-line/indented profile) in the comment so the description is accurate and self-documenting.compact-quotient-filter.H (1)
452-477: Consider refactoringgototo structured control flow.The
goto foundpattern works but is unconventional. A lambda or restructured loop could improve readability:// Alternative: return early from search, then handle removal auto find_in_run = [...] { ... return optional<size_t>; }; if (auto pos = find_in_run()) { /* removal logic */ }This is a stylistic suggestion; the current logic is functionally correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compact-quotient-filter.H` around lines 452 - 477, The code uses a goto label "found" to jump from the search loop into removal logic; refactor into structured control flow by extracting the search into a block or small helper (e.g., a lambda or separate function) that returns an optional position (use the loop with get_rem(pos), get_cont(pos), incr(pos) and has_element(nxt) to locate the element), then if the optional is empty return false, otherwise perform the removal logic (use the same logic that references was_run_start, nxt, nxt_is_cont and calls set_cont(nxt,false), set_occ(fq,false), shift_elements_left(pos) and decrement num_elems_) in a normal if/then branch—this eliminates the goto while preserving exact behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hash-fct.C`:
- Around line 194-208: init_jsw currently repopulates tab[] unconditionally and
can race with reads in jsw_hash; change init_jsw(std::uint32_t) to be one-shot
by first doing an acquire load of init and returning if true, then lock jsw_mtx
and re-check init (double-checked locking) and only populate tab[] if still
false, finally store true with release; alternatively implement a table-swapping
approach (allocate new table, fill it, then atomically swap pointer and set
init) if you need to support reseed without blocking readers — update references
to tab, init, init_jsw, jsw_mtx and jsw_hash accordingly.
- Around line 128-149: The non-__int128 path in mul_xor_fold64 loses carries
when computing low = ll + (lh << 32) + (hl << 32); fix by performing the two
64-bit additions with carry tracking: first add ll and (lh << 32) while
detecting if that addition produced a carry (increment a carry counter), then
add (hl << 32) to the intermediate low and detect/increment any additional
carry; finally compute high = hh + (lh >> 32) + (hl >> 32) + carry_counter and
return low ^ high. Use the existing local names (ll, lh, hl, low, high) and
ensure every addition that can overflow increments high so no carries are lost.
In `@hash-fct.H`:
- Around line 34-38: The code uses std::convertible_to<size_t> in the
HashableByADL concept but doesn't include the <concepts> header; add a direct
include for <concepts> at the top of the file (alongside the existing includes)
so the definition of std::convertible_to is available and compilation succeeds
when instantiating HashableByADL.
- Around line 491-505: The doc comments currently claim siphash24_hash is a
keyed, adversary-resistant hash while its default overloads use public hardcoded
keys (key0=0x0706050403020100ULL, key1=0x0f0e0d0c0b0a0908ULL), so update the
siphash24_hash documentation to explicitly state that only calls supplying
secret, per-process keys provide protection against hash-flooding and that the
default/no-arg overloads are deterministic and do not provide adversarial
resistance; also update the module-level description (the "Cryptographically
strong (keyed)" text referenced at the top of the file) to reflect this
limitation and recommend callers generate and pass secret keys instead of
relying on the defaults.
- Around line 875-905: The trivial-copy fallback is too permissive for types
like floats; update the Aleph_Hash<T> constraint to exclude floating-point types
(and optionally other multi-representation-equality types). Change the
static_assert condition from checking std::is_trivially_copyable_v<T> and not
std::is_pointer_v<T> to also require not std::is_floating_point_v<T> (i.e.
(std::is_trivially_copyable_v<T> && !std::is_pointer_v<T> &&
!std::is_floating_point_v<T])) and update the diagnostic text to instruct
implementing aleph_hash_value() or specializing Aleph_Hash<T> for floats/other
problematic types; keep the operator()(const T&), operator()(const T&, uint32_t)
and the HashableByADL/xxhash64_hash branches the same.
In `@Tests/hash_statistical_test.cc`:
- Around line 165-170: make_prefix_samples() performs shifts of i (a size_t) by
up to 56 bits which is undefined on 32-bit builds; to fix, introduce a 64-bit
copy of i (e.g., std::uint64_t v = static_cast<std::uint64_t>(i)) before the
inner loop and use v in the shift expression inside the loop (replace uses of i
in the shift with v) so shifts are well-defined across platforms while
preserving behavior for variables like suffix_len and sample.
- Around line 637-712: The test wrongly assumes wyhash_hash returns 64 bits; on
32-bit builds it returns a truncated size_t, so h64/coll64 are invalid. Fix by
adding a platform-size guard around the 64-bit portion of the test (the call to
wyhash_hash assigned to v, the h64 set, calculation of coll64, and the
EXPECT_EQ(coll64, 0u) assertion) so it is skipped on 32-bit targets (e.g., check
sizeof(size_t) < 8 and call GTEST_SKIP or conditionally compile out the 64-bit
checks). Alternatively, replace the call with a true 64-bit wyhash entry point
if one is available and use that for h64/coll64; ensure the guard references
wyhash_hash, h64, coll64, and the EXPECT_EQ assertion so only valid 64-bit
behavior is tested.
In `@tpl_b_tree.H`:
- Around line 597-600: Remove the explicit noexcept on the move operations so
the compiler can infer exception-safety from member types: change the move
constructor declaration Gen_B_Tree(Gen_B_Tree&& other) noexcept = default; to
Gen_B_Tree(Gen_B_Tree&& other) = default; and likewise remove noexcept from the
move assignment (e.g., Gen_B_Tree& operator=(Gen_B_Tree&&) noexcept = default;)
if present (the comment notes the related occurrence around the symbol at line
~624) so moves are not forced noexcept when member types (such as Compare) can
throw.
In `@tpl_bplus_tree.H`:
- Around line 794-828: The non-const accessors key_comp() and get_compare()
expose the mutable comparator cmp_, allowing external mutation that can break
tree invariants; change the API so callers cannot obtain a mutable comparator:
remove or delete the non-const overloads Compare &key_comp() and Compare
&get_compare(), and keep (or add) only const accessors that return const
Compare& (e.g., const Compare &key_comp() const noexcept and const Compare
&get_compare() const noexcept) while leaving the internal cmp_ field unchanged
and private; update any internal callers to use the const accessor or access
cmp_ directly inside class methods.
- Around line 323-329: The find_leaf function is marked noexcept but calls
child_index -> upper_bound_index which uses the user-provided comparator cmp_
and may throw; remove noexcept from the find_leaf declaration and definition to
allow exceptions to propagate normally (so callers like contains, lower_bound,
upper_bound, and the Iterator constructor can handle them). Locate the
find_leaf(const Key &key) method and drop the noexcept specifier, keeping the
implementation otherwise unchanged; confirm related callers (contains,
lower_bound, upper_bound, Iterator) rely on the non-noexcept behavior.
- Around line 765-768: The move operations for the class are incorrectly forced
noexcept; remove the explicit `noexcept` qualifier from the defaulted move
constructor `Gen_BPlus_Tree(Gen_BPlus_Tree &&other) noexcept = default;` and the
defaulted move assignment operator (the `operator=(Gen_BPlus_Tree&&) noexcept =
default;` declaration) so they are not unconditionally noexcept and will instead
inherit conditional noexcept status based on member types (e.g., `Compare`),
preventing undefined behavior when `Compare`'s move can throw.
In `@tpl_file_b_map.H`:
- Around line 486-497: insert_or_assign currently performs tree_.remove(probe)
then tree_.insert(record) and with auto_sync=true a crash between these makes
the delete durable; modify insert_or_assign (or Gen_File_B_Tree usage) to avoid
a two-step durable state: either perform the replace under a single atomic
checkpoint/transaction provided by the B-tree (use a single replace/assign API
if available) or temporarily disable auto_sync around the remove+insert and
issue an explicit single sync/checkpoint after the reinsert; alternatively,
mirror the behavior in tpl_file_bplus_map.H by adding the same crash-recovery
caveat and documentation if atomic batching isn't possible. Ensure you change
the code paths referencing insert_or_assign, tree_.remove, and tree_.insert
accordingly so the operation is not left durable as a delete on crash.
In `@tpl_file_b_tree.H`:
- Around line 1302-1323: The code currently treats the WAL as stale and deletes
wal_path_ when recovered_records is empty (dirty_count==0) without verifying the
on-disk header has reached wal_checkpoint_sequence; update the logic around
storage_format_version_on_disk()/FormatVersion and stale_wal so that you only
set stale_wal=true after confirming the persisted header metadata reflects
wal_checkpoint_sequence (in addition to the per-page checks). Concretely, in the
block that uses storage_format_version_on_disk(), call the routine that reads
the on-disk header/state (or extend read_storage_page_if_current to inspect the
header checkpoint sequence) and require persisted_header.checkpoint_sequence >=
wal_checkpoint_sequence before allowing deletion of wal_path_; keep using
recovered_records, read_storage_page_if_current(page_id),
wal_checkpoint_sequence, write_current_header_to_storage(), wal_path_ and
tree_kind() to locate where to add this header verification.
- Around line 2188-2213: The comparator stored in cmp_ must be immutable once
the tree is persisted; remove or disable all non-const accessors that expose
mutable access to cmp_ (specifically the non-const Compare & key_comp() noexcept
and Compare & get_compare() noexcept) so external code (e.g. File_B_Map and
File_BPlus_Map which currently use tree_.key_comp().cmp) cannot change the
comparator; replace them with only const accessors (const Compare & key_comp()
const noexcept / get_compare() const noexcept) or make cmp_ itself const, and
update File_B_Map/File_BPlus_Map call sites to use the const accessors (and stop
writing to .cmp) so the comparator remains immutable after page splits/merges.
---
Outside diff comments:
In `@Examples/set_structures_benchmark.cc`:
- Around line 715-737: The help text is inconsistent with the actual default for
n (variable n set to 2000000); update the usage/help message built in the
argument parsing block (the cout block that prints "Set Data Structures
Benchmark" and "Options:") so the displayed default for -n/--count matches n's
actual default (2,000,000); locate the help string near the argument parsing
code that references argv[0] and change "default: 1000000" to "default: 2000000"
(or dynamically print n) so users see the correct default.
---
Duplicate comments:
In `@Tests/hash_statistical_test.cc`:
- Around line 273-301: The avalanche_ratio function currently calls
effective_output_bits(candidate, samples) which derives an output mask from the
sample set and can undercount the budget when a flipped input activates
previously unseen high bits; replace that sample-derived output_bits with a
fixed per-candidate output width and use that fixed value when incrementing
comparisons. Concretely, remove the call to effective_output_bits(...), obtain a
fixed width from the candidate (e.g. a candidate.output_bit_width() accessor or,
if no accessor exists, the hash output size in bits such as
sizeof(return_type_of(candidate.hash))*8), use that fixed value in place of
output_bits when incrementing comparisons, and keep changed_bits +=
std::popcount(original ^ hashed) as-is so the ratio is normalized against the
candidate's full output width. Ensure the change targets the avalanche_ratio
function and updates comparisons and any related variable names accordingly.
In `@tpl_b_tree.H`:
- Around line 626-648: The comparator must be immutable after construction: stop
exposing non-const accessors that allow callers to modify cmp_; change the
mutable overloads key_comp() and get_compare() to return const Compare& (or
remove/delete the non-const overloads) and make the backing member cmp_
effectively const by initializing it in the constructor (e.g., declare cmp_ as
const Compare or otherwise ensure it cannot be changed after construction) so
all external access is read-only and ordering cannot be altered at runtime.
---
Nitpick comments:
In @.clang-format:
- Around line 1-3: Update the top-of-file descriptive comments to match the
active brace configuration instead of claiming "K&R braces" or "GNU style":
change the Line 2 comment that currently says “K&R braces” and the comments
around lines 15-16 that say “GNU style” to describe the actual
next-line/indented brace profile used by the file; reference the .clang-format
brace-related settings (the custom next-line/indented profile) in the comment so
the description is accurate and self-documenting.
In `@compact-quotient-filter.H`:
- Around line 452-477: The code uses a goto label "found" to jump from the
search loop into removal logic; refactor into structured control flow by
extracting the search into a block or small helper (e.g., a lambda or separate
function) that returns an optional position (use the loop with get_rem(pos),
get_cont(pos), incr(pos) and has_element(nxt) to locate the element), then if
the optional is empty return false, otherwise perform the removal logic (use the
same logic that references was_run_start, nxt, nxt_is_cont and calls
set_cont(nxt,false), set_occ(fq,false), shift_elements_left(pos) and decrement
num_elems_) in a normal if/then branch—this eliminates the goto while preserving
exact behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8d320098-44dc-4542-a777-8dd23f198590
📒 Files selected for processing (22)
.clang-formatExamples/compact_cuckoo_filter_example.CExamples/compact_quotient_filter_example.CExamples/set_structures_benchmark.ccTests/compact-cuckoo-filter.ccTests/compact-quotient-filter.ccTests/cuckoo-filter.ccTests/hash-it.ccTests/hash_fct_test.ccTests/hash_statistical_test.cccompact-cuckoo-filter.Hcompact-quotient-filter.Hcuckoo-filter.Hhash-fct.Chash-fct.Hquotient-filter.Htpl_b_tree.Htpl_bplus_tree.Htpl_file_b_map.Htpl_file_b_tree.Htpl_file_bplus_map.Htpl_file_bplus_tree.H
🚧 Files skipped from review as they are similar to previous changes (2)
- Tests/cuckoo-filter.cc
- cuckoo-filter.H
There was a problem hiding this comment.
Pull request overview
This PR adds new multiway search tree containers (B-Tree / B+ Tree, including file-backed variants), probabilistic filters (Cuckoo / Quotient and compact variants), and expands/modernizes the hashing API, alongside new docs, examples, and tests.
Changes:
- Introduces in-memory and persistent B-Tree/B+ Tree containers plus file-backed map wrappers, with new docs and runnable examples.
- Adds Cuckoo/Quotient filters (and compact bit-packed versions) plus examples and unit tests.
- Refactors hashing defaults/APIs and adds modern hash implementations; updates CI to exclude an expensive/flaky statistical test.
Reviewed changes
Copilot reviewed 49 out of 58 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tpl_dynSetHash.H | Switches default hash function pointer to new pointer-wrapper API. |
| tpl_dynMapOhash.H | Updates default primary/secondary hash function pointers to pointer-wrapper API. |
| tpl_dynLhash.H | Updates default hash function pointer to pointer-wrapper API. |
| scripts/check_examples_run | Adds a Ruby harness to build/run examples with timeouts and logging. |
| hash-fct.C | Reworks JSW init for thread-safety; adds xxHash64/wyhash/siphash and helper primitives. |
| docs/b_tree_family.md | Adds dedicated documentation for the B-Tree/B+ Tree family and file-backed variants. |
| cuckoo-filter.H | Adds a Cuckoo filter implementation with optional deterministic kick RNG. |
| compact-quotient-filter.H | Adds a bit-packed compact quotient filter implementation. |
| compact-cuckoo-filter.H | Adds a bit-packed fingerprint Cuckoo filter variant. |
| bloom-filter.H | Refines Bloom filter API/docs and updates default hash function to seeded pointer-wrapper. |
| al-matrix.H | Updates pair-hash function pointers to pointer-wrapper API. |
| Tests/quotient-filter.cc | Adds unit tests for Quotient_Filter. |
| Tests/hash_fct_test.cc | Extends hash tests for new hashes and pointer-wrapper behavior; deterministic JSW init. |
| Tests/hash-it.cc | Adjusts a hash-table test to use an explicit constant-hash functor. |
| Tests/file_bplus_map_test.cc | Adds tests for file-backed B+ map behavior (range, reopen, iterators). |
| Tests/file_b_map_test.cc | Adds tests for file-backed B map behavior (find/assign/reopen/bounds). |
| Tests/cuckoo-filter.cc | Adds unit tests for Cuckoo_Filter. |
| Tests/compact-cuckoo-filter.cc | Adds comprehensive tests for Compact_Cuckoo_Filter. |
| Tests/bplus_tree_test.cc | Adds tests for in-memory BPlus_Tree behavior and iterators. |
| Tests/b_tree_test.cc | Adds tests for in-memory B_Tree behavior. |
| Tests/CMakeLists.txt | Registers new tests (trees/maps/filters/hashes). |
| README.md | Documents new tree/filter families and updates library statistics/feature matrix. |
| README.es.md | Updates Spanish README to reflect new probabilistic structures listing. |
| Examples/test-odhash.cc | Fixes example to use pair hash combiners and avoid unused-return warnings. |
| Examples/set_structures_benchmark.cc | Changes default benchmark size. |
| Examples/quotient_filter_example.C | Adds a comprehensive quotient filter example program. |
| Examples/file_bplus_tree_example.cc | Adds a runnable example for File_BPlus_Tree. |
| Examples/file_bplus_map_example.cc | Adds a runnable example for File_BPlus_Map. |
| Examples/file_b_tree_example.cc | Adds a runnable example for File_B_Tree. |
| Examples/file_b_map_example.cc | Adds a runnable example for File_B_Map. |
| Examples/cuckoo_filter_example.cc | Adds a runnable example for Cuckoo_Filter. |
| Examples/compact_quotient_filter_example.C | Adds comprehensive examples for Compact_Quotient_Filter. |
| Examples/compact_cuckoo_filter_example.C | Adds comprehensive examples for Compact_Cuckoo_Filter. |
| Examples/bplus_tree_example.cc | Adds a runnable example for in-memory BPlus_Tree. |
| Examples/b_tree_example.cc | Adds a runnable example for in-memory B_Tree. |
| CMakeLists.txt | Registers new headers in the exported/header list. |
| .github/workflows/ci.yml | Excludes hash_statistical_test from CI/coverage/asan/tsan runs. |
| .clang-format | Replaces CLion-generated style with a clearer project-oriented clang-format config. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (8)
tpl_file_b_map.H (1)
508-518:⚠️ Potential issue | 🟠 MajorBatch the replace path into one checkpoint.
The new note documents the crash window, but the implementation still executes
tree_.remove(probe)andtree_.insert(record)as two separate durable operations when auto-sync is enabled. A crash between them leaves the key durably deleted, even though this wrapper already exposes enough API (auto_sync_enabled(),set_auto_sync(),sync()) to batch both mutations into one checkpoint.Temporarily disable auto-sync for the replace path, perform both mutations, then issue a single
sync()if auto-sync was originally enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 508 - 518, The replace path in File_B_Map::insert_or_assign currently does tree_.remove(probe) and tree_.insert(record) as separate durable ops; wrap those two calls by checking auto_sync_enabled(), save its state, call set_auto_sync(false) before calling tree_.remove(probe) and tree_.insert(record), then restore the original auto-sync state and if auto_sync was originally true call sync() once; use the existing methods auto_sync_enabled(), set_auto_sync(), and sync() around the remove/insert and keep the same error checks (ah_runtime_error_unless) for remove and insert.tpl_b_tree.H (1)
596-630:⚠️ Potential issue | 🟠 MajorRemove the mutable comparator accessors.
Returning
Compare&here lets callers mutate the ordering after inserts and splits have already shaped the tree. That breaks the assumptions behindcontains(),lower_bound(),insert(), andremove().💡 Minimal fix
- Compare &key_comp() noexcept - { - return cmp_; - } - /** `@brief` Return the comparison object. * `@return` Constant reference to the comparison functor. * `@throws` Nothing. */ [[nodiscard]] const Compare &key_comp() const noexcept { return cmp_; } - - Compare &get_compare() noexcept - { - return key_comp(); - } /** `@brief` Return the comparison object. * `@return` Constant reference to the comparison functor. * `@throws` Nothing. */🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 596 - 630, Remove or disable the mutable comparator accessors so callers cannot change ordering after tree mutations: delete or change the non-const functions Compare &key_comp() noexcept and Compare &get_compare() noexcept to return const Compare& (or mark them =delete) and keep only the const overloads ([[nodiscard]] const Compare &key_comp() const noexcept and [[nodiscard]] const Compare &get_compare() const noexcept); update any internal uses/call-sites to use the const reference if needed.tpl_bplus_tree.H (1)
794-828:⚠️ Potential issue | 🟠 MajorRemove the mutable comparator accessors.
key_comp()andget_compare()returningCompare&let callers change the ordering after the tree is populated. Oncecmp_changes, separator keys and leaf ordering no longer match the navigation logic used byfind_leaf(),remove(), bounds queries, and range iterators.💡 Minimal fix
- Compare &key_comp() noexcept - { - return cmp_; - } - /** `@brief` Return the comparison object. * `@return` Constant reference to the comparison functor. * `@throws` Nothing. */ [[nodiscard]] const Compare &key_comp() const noexcept { return cmp_; } - - Compare &get_compare() noexcept - { - return key_comp(); - } /** `@brief` Return the comparison object. * `@return` Constant reference to the comparison functor. * `@throws` Nothing. */🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_bplus_tree.H` around lines 794 - 828, Remove the mutable comparator accessors so callers cannot mutate the ordering after construction: delete the non-const overloads Compare &key_comp() and Compare &get_compare() and keep only the const [[nodiscard]] const Compare &key_comp() const noexcept (and the const get_compare() const overload if present); update any call sites that relied on the non-const overloads to use the const accessor or store/modify a comparator copy before inserting data (do not change cmp_ directly), and ensure tree operations (find_leaf, remove, bounds/range iterators) continue to use cmp_ immutably.Tests/hash_statistical_test.cc (1)
279-304:⚠️ Potential issue | 🟠 MajorMust fix: avalanche is still normalized against an unstable denominator.
output_bitscomes from the unmutated corpus only. If a mutated sample flips a previously unseen high bit,changed_bitscounts it butcomparisonsdoes not, so the reported ratio is inflated. Normalize against a fixed width (sizeof(size_t) * 8in this harness) or the union of original and mutated outputs.As per coding guidelines "Tests must verify algorithmic correctness, not just absence of crashes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/hash_statistical_test.cc` around lines 279 - 304, The avalanche ratio is being normalized by output_bits computed from the unmutated corpus (effective_output_bits), which misses bits introduced by mutations; update the normalization to use a stable fixed width (e.g., sizeof(size_t) * 8) instead of output_bits when accumulating comparisons so every possible output bit is counted; adjust any reference to output_bits in the loop that increments comparisons (and, if present, in the final division) to use the fixed bit width, keeping candidate.hash, samples, changed_bits and comparisons logic otherwise unchanged.Tests/cuckoo-filter.cc (1)
31-31:⚠️ Potential issue | 🟠 MajorMust fix: don't assert exact negatives on a probabilistic filter.
contains()can legitimately returntruefor absent keys once the filter holds other fingerprints, so theseEXPECT_FALSEchecks are flaky even when the implementation is correct. Replace them with a bounded false-positive budget over a fixed negative corpus.As per coding guidelines "Tests must verify algorithmic correctness, not just absence of crashes" and "Tests must be deterministic and repeatable."
Also applies to: 52-54, 95-95
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/cuckoo-filter.cc` at line 31, The test is asserting exact negatives on the probabilistic CuckooFilter via EXPECT_FALSE(filter.contains("date")), which is flaky because contains() may legitimately return true for absent keys; instead, replace per-key EXPECT_FALSE checks with a deterministic false-positive budget over a fixed negative corpus: call filter.contains() across a predefined set of negative keys and assert the total number of positives is <= an acceptable threshold (e.g., expected false-positive rate * corpus size) using the same filter instance (named filter) and the contains() method; apply this change to the occurrences that use EXPECT_FALSE (including the checks at lines referencing "date" and the other noted blocks) so the test verifies probabilistic behavior deterministically rather than exact negatives.hash-fct.C (1)
223-230:⚠️ Potential issue | 🔴 CriticalMust fix:
init_jsw(seed)can still race with livejsw_hash()readers.Once
initis true,jsw_hash()readstab[]without locking, butinit_jsw(seed)rewrites that same table underjsw_mtx. That leaves a reader/writer data race during reseed, and because the seeded path bypassesjsw_init_flag, a laterinit_jsw()can also silently replace the custom table. Make initialization one-shot across both overloads, or publish a new immutable table atomically before exposing it to readers.Based on learnings "Avoid data races; verify with ThreadSanitizer (TSan)".
Also applies to: 232-259
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.C` around lines 223 - 230, The reseed path in init_jsw(seed) currently rewrites the shared tab[] while jsw_hash() readers access it unlocked, causing a data race; fix by making initialization one-shot across both overloads or by allocating a new immutable table and publishing it atomically: either (A) have init_jsw(seed) respect the existing jsw_init_flag/once semantics so the table is only set once and never mutated, or (B) change the shared tab[] to be referenced via an atomic pointer (e.g., jsw_tab_ptr) and modify init_jsw to allocate jsw_fill_table into a new table and atomically swap the pointer under jsw_mtx so readers in jsw_hash() can read the stable table without locks; run ThreadSanitizer to verify the race is gone.cuckoo-filter.H (1)
214-227:⚠️ Potential issue | 🟠 MajorMust fix:
Cuckoo_Filter(capacity)still over-promises what it can hold.This constructor sizes the table for the exact number of fingerprint slots, but bounded kick insertion cannot reach 100% occupancy reliably. That means the documented “minimum number of items to support” can be missed long before
size() == capacity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 214 - 227, The constructor Cuckoo_Filter(size_t capacity, ...) currently sizes num_buckets to exactly capacity / EntriesPerBucket which over-promises occupancy; change the sizing to account for a safe maximum load factor (e.g. 0.9 or configurable LOAD_FACTOR) so the table is allocated with extra slack for bounded kick insertion failures. Specifically, in the Cuckoo_Filter constructor replace the needed_buckets calculation with needed_buckets = ceil(capacity / (EntriesPerBucket * LOAD_FACTOR)), then continue to round up via next_power_of_2, set bucket_mask and call buckets.putn(num_buckets) as before; reference the Cuckoo_Filter constructor, EntriesPerBucket, next_power_of_2, bucket_mask and buckets.putn when making the change.compact-cuckoo-filter.H (1)
231-249:⚠️ Potential issue | 🟠 MajorMust fix: the constructor still sizes for 100% occupancy.
needed_buckets = ceil(capacity / EntriesPerBucket)only allocates the exact slot count. Bounded cuckoo insertion starts failing well before that, so the documented “minimum number of items to support” is not actually guaranteed. Add a target load factor before the power-of-two rounding.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compact-cuckoo-filter.H` around lines 231 - 249, The constructor Compact_Cuckoo_Filter currently computes needed_buckets as ceil(capacity / EntriesPerBucket) which assumes 100% load and breaks insertion; change the calculation to account for a target load factor (e.g., constexpr double kTargetLoad = 0.9 or make it a constructor parameter) so that needed_buckets = ceil(capacity / (EntriesPerBucket * kTargetLoad)) before calling next_power_of_2; preserve the power-of-two rounding, the min-1 bucket guard (if num_buckets == 0 num_buckets = 1), and the seed/rng logic, and then call buckets.putn(num_buckets) as before.
🧹 Nitpick comments (1)
hash-fct.H (1)
933-969: Consider documenting NaN behavior in float/double specializations.The
-0.0normalization is correct, but the comment at lines 891-894 mentions NaN as a concern while the specializations don't address it. Although technically the hash invariant isn't violated (sinceNaN != NaN, two NaN keys are never "equal"), the behavior may be surprising: different NaN encodings will produce different hashes.Consider adding a brief note in the specialization comments about NaN behavior, e.g., "NaN values with different bit patterns hash differently; this is consistent with IEEE 754 where NaN is not equal to itself."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.H` around lines 933 - 969, Update the Aleph_Hash<float> and Aleph_Hash<double> specializations to document NaN behavior: add a concise comment near the operator()(float) / operator()(double) (and their seeded overloads) noting that different NaN bit-patterns will produce different hashes (i.e., NaN != NaN so multiple NaNs are not required to collide) and that only -0.0 is normalized; reference the specialization names Aleph_Hash<float> and Aleph_Hash<double> and the operator() methods so readers can find the exact spots.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Tests/hash_validation_test.cc`:
- Around line 183-196: RankMetrics currently only has is_strong which collapses
weak and general profiles; add a distinct flag or enum (e.g., bool is_weak or
enum Profile {STRONG, WEAK, GENERAL}) to RankMetrics and update all places that
use RankMetrics (table label generation, weak_candidates(), best_general
selection logic, and the EXPECT_LT assertion) so that general-only candidates
exclude weak ones and table labels print [Genrl] only for true general profiles;
ensure weak_candidates() keeps returning weak items and best_general
specifically ignores items with is_weak set.
- Around line 51-64: Add the missing include for <string_view> to the top of
Tests/hash_validation_test.cc so uses of std::string_view (notably around the
code referenced at lines where std::string_view is used, e.g., the block at
~1305-1312) are portable; insert a line '#include <string_view>' alongside the
other standard headers (near the existing includes such as <string> and
<vector>) to ensure compilation on toolchains that don't provide it
transitively.
In `@tpl_b_tree.H`:
- Around line 496-497: verify() currently uses Aleph::is_sorted(node->keys,
cmp_) which allows equal adjacent keys; change the check so each node's keys are
strictly increasing: replace the Aleph::is_sorted call with an explicit loop (or
std::adjacent_find) over node->keys that returns false if any adjacent pair
fails strict order (i.e., for every i ensure cmp_(keys[i], keys[i+1]) is true).
Apply this change where verify() references node->keys and cmp_ so nodes with
duplicate adjacent keys fail verification.
In `@tpl_file_b_tree.H`:
- Around line 2026-2077: The verifier in verify_node trusts page kinds and can
cycle; add explicit validation that the page kind is either a leaf or internal
(use is_leaf(p) or is_internal(p)) and treat any other kind (including Free) as
invalid and return false, and introduce a visited-page guard (e.g., a visited
set/bitmap keyed by page_id stored in the recursive state) to detect and reject
revisits before recursing into children (check pages_.size() and mark page_id
visited, return false on revisit) so verify_node(page_id, ...) returns false for
malformed kinds or cycles instead of following invalid links or recursing
indefinitely.
---
Duplicate comments:
In `@compact-cuckoo-filter.H`:
- Around line 231-249: The constructor Compact_Cuckoo_Filter currently computes
needed_buckets as ceil(capacity / EntriesPerBucket) which assumes 100% load and
breaks insertion; change the calculation to account for a target load factor
(e.g., constexpr double kTargetLoad = 0.9 or make it a constructor parameter) so
that needed_buckets = ceil(capacity / (EntriesPerBucket * kTargetLoad)) before
calling next_power_of_2; preserve the power-of-two rounding, the min-1 bucket
guard (if num_buckets == 0 num_buckets = 1), and the seed/rng logic, and then
call buckets.putn(num_buckets) as before.
In `@cuckoo-filter.H`:
- Around line 214-227: The constructor Cuckoo_Filter(size_t capacity, ...)
currently sizes num_buckets to exactly capacity / EntriesPerBucket which
over-promises occupancy; change the sizing to account for a safe maximum load
factor (e.g. 0.9 or configurable LOAD_FACTOR) so the table is allocated with
extra slack for bounded kick insertion failures. Specifically, in the
Cuckoo_Filter constructor replace the needed_buckets calculation with
needed_buckets = ceil(capacity / (EntriesPerBucket * LOAD_FACTOR)), then
continue to round up via next_power_of_2, set bucket_mask and call
buckets.putn(num_buckets) as before; reference the Cuckoo_Filter constructor,
EntriesPerBucket, next_power_of_2, bucket_mask and buckets.putn when making the
change.
In `@hash-fct.C`:
- Around line 223-230: The reseed path in init_jsw(seed) currently rewrites the
shared tab[] while jsw_hash() readers access it unlocked, causing a data race;
fix by making initialization one-shot across both overloads or by allocating a
new immutable table and publishing it atomically: either (A) have init_jsw(seed)
respect the existing jsw_init_flag/once semantics so the table is only set once
and never mutated, or (B) change the shared tab[] to be referenced via an atomic
pointer (e.g., jsw_tab_ptr) and modify init_jsw to allocate jsw_fill_table into
a new table and atomically swap the pointer under jsw_mtx so readers in
jsw_hash() can read the stable table without locks; run ThreadSanitizer to
verify the race is gone.
In `@Tests/cuckoo-filter.cc`:
- Line 31: The test is asserting exact negatives on the probabilistic
CuckooFilter via EXPECT_FALSE(filter.contains("date")), which is flaky because
contains() may legitimately return true for absent keys; instead, replace
per-key EXPECT_FALSE checks with a deterministic false-positive budget over a
fixed negative corpus: call filter.contains() across a predefined set of
negative keys and assert the total number of positives is <= an acceptable
threshold (e.g., expected false-positive rate * corpus size) using the same
filter instance (named filter) and the contains() method; apply this change to
the occurrences that use EXPECT_FALSE (including the checks at lines referencing
"date" and the other noted blocks) so the test verifies probabilistic behavior
deterministically rather than exact negatives.
In `@Tests/hash_statistical_test.cc`:
- Around line 279-304: The avalanche ratio is being normalized by output_bits
computed from the unmutated corpus (effective_output_bits), which misses bits
introduced by mutations; update the normalization to use a stable fixed width
(e.g., sizeof(size_t) * 8) instead of output_bits when accumulating comparisons
so every possible output bit is counted; adjust any reference to output_bits in
the loop that increments comparisons (and, if present, in the final division) to
use the fixed bit width, keeping candidate.hash, samples, changed_bits and
comparisons logic otherwise unchanged.
In `@tpl_b_tree.H`:
- Around line 596-630: Remove or disable the mutable comparator accessors so
callers cannot change ordering after tree mutations: delete or change the
non-const functions Compare &key_comp() noexcept and Compare &get_compare()
noexcept to return const Compare& (or mark them =delete) and keep only the const
overloads ([[nodiscard]] const Compare &key_comp() const noexcept and
[[nodiscard]] const Compare &get_compare() const noexcept); update any internal
uses/call-sites to use the const reference if needed.
In `@tpl_bplus_tree.H`:
- Around line 794-828: Remove the mutable comparator accessors so callers cannot
mutate the ordering after construction: delete the non-const overloads Compare
&key_comp() and Compare &get_compare() and keep only the const [[nodiscard]]
const Compare &key_comp() const noexcept (and the const get_compare() const
overload if present); update any call sites that relied on the non-const
overloads to use the const accessor or store/modify a comparator copy before
inserting data (do not change cmp_ directly), and ensure tree operations
(find_leaf, remove, bounds/range iterators) continue to use cmp_ immutably.
In `@tpl_file_b_map.H`:
- Around line 508-518: The replace path in File_B_Map::insert_or_assign
currently does tree_.remove(probe) and tree_.insert(record) as separate durable
ops; wrap those two calls by checking auto_sync_enabled(), save its state, call
set_auto_sync(false) before calling tree_.remove(probe) and
tree_.insert(record), then restore the original auto-sync state and if auto_sync
was originally true call sync() once; use the existing methods
auto_sync_enabled(), set_auto_sync(), and sync() around the remove/insert and
keep the same error checks (ah_runtime_error_unless) for remove and insert.
---
Nitpick comments:
In `@hash-fct.H`:
- Around line 933-969: Update the Aleph_Hash<float> and Aleph_Hash<double>
specializations to document NaN behavior: add a concise comment near the
operator()(float) / operator()(double) (and their seeded overloads) noting that
different NaN bit-patterns will produce different hashes (i.e., NaN != NaN so
multiple NaNs are not required to collide) and that only -0.0 is normalized;
reference the specialization names Aleph_Hash<float> and Aleph_Hash<double> and
the operator() methods so readers can find the exact spots.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f629a7e7-1a69-42fc-bde8-6ba9c1c2b2b1
📒 Files selected for processing (17)
.clang-formatTests/CMakeLists.txtTests/cuckoo-filter.ccTests/hash_fct_test.ccTests/hash_statistical_test.ccTests/hash_validation_test.cccompact-cuckoo-filter.Hcuckoo-filter.Hhash-fct.Chash-fct.Hscripts/check_examples_runtpl_b_tree.Htpl_bplus_tree.Htpl_file_b_map.Htpl_file_b_tree.Htpl_file_bplus_map.Htpl_file_bplus_tree.H
🚧 Files skipped from review as they are similar to previous changes (2)
- Tests/CMakeLists.txt
- .clang-format
Refactor cuckoo filter tests to: - Use probabilistic assertions for negative lookups in `CuckooFilterTest.Insertion`. - Add a false positive rate budget for removals in `CuckooFilterTest.Deletion`. - Consolidate common initialization into `HashStatisticalTest` fixture. - Add `std::string_view` for weak hash function names in `HashValidation` tests.
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (3)
tpl_bplus_tree.H (1)
174-183: Minor redundancy:is_sortedcheck is subsumed by the strict ordering loop.The explicit loop on lines 179-181 already verifies
cmp_(keys[i-1], keys[i])for all adjacent pairs, which implies sorted order. TheAleph::is_sortedcall on line 176 is therefore redundant and adds an extra O(n) pass.♻️ Suggested simplification
[[nodiscard]] bool strictly_sorted(const Array<Key> &keys) const { - if (not Aleph::is_sorted(keys, cmp_)) - return false; - for (size_t i = 1; i < keys.size(); ++i) if (not cmp_(keys[i - 1], keys[i])) return false; return true; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_bplus_tree.H` around lines 174 - 183, The strictly_sorted function does an unnecessary O(n) pass by calling Aleph::is_sorted(keys, cmp_) before the adjacent-pair loop; remove the Aleph::is_sorted call and rely solely on the existing loop that checks cmp_(keys[i-1], keys[i]) for all i to determine strict ordering (update the body of strictly_sorted to drop the initial check while keeping the loop and return logic intact).tpl_b_tree.H (1)
437-452: Unchecked optional dereference relies on tree invariants.Lines 439 and 446 dereference
*max_in(...)and*min_in(...)without checking if the optional has a value. This is safe if tree invariants hold (the preceding checks ensure the child has>= MinDegreekeys), but a corrupted tree would cause undefined behavior.Consider adding an assertion for defensive programming, or document that callers must ensure tree validity via
verify().🛡️ Defensive alternative
if (node->children[idx]->keys.size() >= MinDegree) { - const Key pred = *max_in(node->children[idx].get()); + auto pred_opt = max_in(node->children[idx].get()); + ah_runtime_error_unless(pred_opt.has_value()) << "B_Tree invariant violated: no predecessor"; + const Key pred = *pred_opt; node->keys[idx] = pred; return remove_from(node->children[idx].get(), pred); } if (node->children[idx + 1]->keys.size() >= MinDegree) { - const Key succ = *min_in(node->children[idx + 1].get()); + auto succ_opt = min_in(node->children[idx + 1].get()); + ah_runtime_error_unless(succ_opt.has_value()) << "B_Tree invariant violated: no successor"; + const Key succ = *succ_opt; node->keys[idx] = succ; return remove_from(node->children[idx + 1].get(), succ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 437 - 452, The code dereferences optionals returned by max_in(...) and min_in(...) in the block inside remove_from (lines using node->children[idx] and node->children[idx + 1]) without verifying they contain values; add a defensive check or assertion to avoid UB on a corrupted tree: either assert the optional has_value() (e.g., assert(max_in(...).has_value())/assert(min_in(...).has_value())) immediately before dereferencing, or explicitly test the optional and handle the failure (log/throw or call verify()) and then use the value; reference max_in, min_in, remove_from and merge_children when making the change.tpl_file_b_tree.H (1)
51-52: Nice to have: replacestd::unordered_setvisited tracking with an Aleph container/bitmap.
verify_node()can use a page-id bitmap (Array<unsigned char>) keyed bypages_.size()to avoid STL hash container usage in this core header and reduce allocation overhead.As per coding guidelines: "MUST use Aleph containers when an equivalent exists. Standard library containers are ONLY permitted when no Aleph equivalent is available."
Also applies to: 2023-2024, 2649-2650
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_tree.H` around lines 51 - 52, The verify_node() implementation currently uses std::unordered_set for visited tracking; replace that with an Aleph bitmap container by allocating an Aleph Array<unsigned char> sized to pages_.size() and using it as a page-id bitmap (set byte flags for visited pages) instead of hashing; update any logic that inserts/tests membership in the unordered_set to read/write the corresponding byte in the Array<unsigned char> (use the same index/key semantics as pages_.size()), remove the unordered_set include and add the Aleph container include, and ensure the bitmap is cleared/reset between verify_node() runs to preserve the original semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hash-fct.C`:
- Around line 219-231: The lazy-init path and the seeded re-seed path race
because init_jsw(std::uint32_t) calls jsw_fill_table() directly while jsw_hash()
may invoke std::call_once(jsw_init_flag, jsw_fill_table, Default_Hash_Seed)
without holding jsw_mtx; fix by ensuring all writes to tab[] are serialized
through the same mutex+once logic: in jsw_hash() acquire the jsw_mtx
(unique_lock) around the std::call_once(jsw_init_flag, jsw_fill_table,
Default_Hash_Seed) call so the one-time initialization runs under the exclusive
lock, and change init_jsw(std::uint32_t seed) to use
std::call_once(jsw_init_flag, jsw_fill_table, seed) while holding the
unique_lock instead of calling jsw_fill_table() directly so both paths go
through the once-flag under the mutex.
- Around line 572-581: The functions xxhash64_hash and siphash24_hash perform
pointer arithmetic on key before checking len; add an early guard that checks if
len == 0 immediately after casting key to const uint8_t* and before computing
end to avoid nullptr + 0 UB, and return the appropriate empty-buffer hash (for
xxhash64_hash return the seed + prime5 behavior already used for empty input;
for siphash24_hash return its defined empty-input result) so all later code can
assume a non-null pointer when computing end.
In `@hash-fct.H`:
- Around line 47-49: Update the public documentation in the header to reflect
the new defaults: change descriptions that claim dft_hash_fct() and
snd_hash_fct() use xxHash64 to state they now route through wyhash_hash (and
that snd_hash_fct() uses the alternate deterministic seed), and replace any
statement that JSW lazy-init is time-seeded with the fact that JSW is
deterministically seeded via Default_Hash_Seed; update all occurrences
referenced (the overview around dft_hash_fct()/snd_hash_fct(), the JSW init text
in the 62-79 block, and the later JSW references at 254-255) so callers know
about wyhash_hash and Default_Hash_Seed for reproducibility and compatibility.
In `@Tests/cuckoo-filter.cc`:
- Around line 128-142: The test is non-deterministic because Cuckoo_Filter's
internal RNG uses std::random_device; change the test to construct the filter
with a fixed seed and update the filter to accept and use that seed: add an
overload/parameter to the Cuckoo_Filter constructor (the class Cuckoo_Filter and
its constructor) that takes a uint32_t seed and initializes the filter's
internal RNG from it, then in the test call that constructor (e.g., pass 42) so
the kick-chain bucket selection and slot swaps are deterministic across runs
while leaving the existing behavior intact when no seed is provided.
In `@tpl_b_tree.H`:
- Around line 83-137: The file contains two consecutive Doxygen blocks
documenting Gen_B_Tree (both using `@brief`) which will confuse Doxygen; remove
the duplicate block and consolidate documentation into a single Doxygen comment
for Gen_B_Tree (keep the most complete description and examples), ensuring only
one `@brief` and one class-level comment remains for Gen_B_Tree so the class
documentation is not duplicated.
In `@tpl_file_b_map.H`:
- Around line 497-506: Update the stale crash-recovery note to reflect the
auto-sync batching change: explain that when replacing an existing key
Gen_File_B_Tree now suppresses auto_sync for the remove/insert pair (so they are
batched into one WAL write), and therefore the previous claim that a crash
between tree_.remove(probe) and tree_.insert(record) can leave a durable delete
is no longer accurate; modify the wording around Gen_File_B_Tree,
tree_.remove(probe), tree_.insert(record), and auto_sync to state the new
atomicity/behavior (or explicitly document any remaining edge-cases) so the
comment matches the current implementation.
- Around line 516-524: The code in File_B_Map::insert_or_assign disables
auto_sync with set_auto_sync(false) but does not restore it if
tree_.remove(probe) or tree_.insert(record) throws; wrap the critical section in
an RAII guard or a try/finally pattern so set_auto_sync(was_sync) is always
executed (even on exceptions) and, if was_sync was true, ensure sync() is called
after restoration when appropriate; specifically place the restore call in a
destructor of a small scope_guard or in a catch/rethrow block around the calls
to tree_.remove and tree_.insert so that set_auto_sync(was_sync) and conditional
sync() always run before propagating the exception.
In `@tpl_file_b_tree.H`:
- Around line 2299-2302: The setter set_auto_sync(const bool enabled) noexcept
currently assigns auto_sync_ directly and can enable sync on a Read_Only handle;
change it to check the read-only state (e.g. using the class's read_only() or
Read_Only flag) and clamp the effective value to false when read-only, i.e. set
auto_sync_ = enabled && !read_only() (or equivalent) so auto_sync_enabled() can
never return true for read-only handles; update set_auto_sync to enforce this
invariant and keep noexcept.
---
Nitpick comments:
In `@tpl_b_tree.H`:
- Around line 437-452: The code dereferences optionals returned by max_in(...)
and min_in(...) in the block inside remove_from (lines using node->children[idx]
and node->children[idx + 1]) without verifying they contain values; add a
defensive check or assertion to avoid UB on a corrupted tree: either assert the
optional has_value() (e.g.,
assert(max_in(...).has_value())/assert(min_in(...).has_value())) immediately
before dereferencing, or explicitly test the optional and handle the failure
(log/throw or call verify()) and then use the value; reference max_in, min_in,
remove_from and merge_children when making the change.
In `@tpl_bplus_tree.H`:
- Around line 174-183: The strictly_sorted function does an unnecessary O(n)
pass by calling Aleph::is_sorted(keys, cmp_) before the adjacent-pair loop;
remove the Aleph::is_sorted call and rely solely on the existing loop that
checks cmp_(keys[i-1], keys[i]) for all i to determine strict ordering (update
the body of strictly_sorted to drop the initial check while keeping the loop and
return logic intact).
In `@tpl_file_b_tree.H`:
- Around line 51-52: The verify_node() implementation currently uses
std::unordered_set for visited tracking; replace that with an Aleph bitmap
container by allocating an Aleph Array<unsigned char> sized to pages_.size() and
using it as a page-id bitmap (set byte flags for visited pages) instead of
hashing; update any logic that inserts/tests membership in the unordered_set to
read/write the corresponding byte in the Array<unsigned char> (use the same
index/key semantics as pages_.size()), remove the unordered_set include and add
the Aleph container include, and ensure the bitmap is cleared/reset between
verify_node() runs to preserve the original semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0842bc61-2859-49a5-ba1f-625548d6630d
📒 Files selected for processing (11)
Tests/cuckoo-filter.ccTests/hash_statistical_test.ccTests/hash_validation_test.cccompact-cuckoo-filter.Hcuckoo-filter.Hhash-fct.Chash-fct.Htpl_b_tree.Htpl_bplus_tree.Htpl_file_b_map.Htpl_file_b_tree.H
Refactor hash functions with improved noexcept correctness and deterministic lazy initialization. Enhance B-tree verification and correctness logic for key operations, including copy/move range operations and child node management. This commit also updates example and test files to reflect these changes and ensure consistent behavior across different components.
There was a problem hiding this comment.
Pull request overview
Adds new multiway tree containers (B-Tree/B+Tree, including file-backed variants) and new probabilistic filters (Cuckoo/Quotient + compact variants), plus supporting hashing updates, docs/examples, and expanded test coverage.
Changes:
- Introduces new ordered multiway tree APIs (in-memory and persistent) and adds documentation + runnable examples.
- Adds Cuckoo/Quotient filters (plus compact bit-packed variants) and extends hashing backends/tests (xxHash64/wyHash/sipHash + JSW init changes).
- Updates CI/test plumbing (new tests, excludes an expensive statistical test) and formatting/tooling (clang-format + example runner script).
Reviewed changes
Copilot reviewed 61 out of 71 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tpl_dynSetHash.H | Switches default hash function pointer wrapper used by dynamic hash table. |
| tpl_dynMapOhash.H | Updates default hash function pointers for open addressing map. |
| tpl_dynLhash.H | Updates default hash function pointer wrapper for linear hash table. |
| scripts/check_examples_run | Adds a Ruby helper to configure/build and run Examples with per-example logging/timeouts. |
| hash-fct.C | Refactors JSW init for thread-safety; adds modern hashes (xxHash64/wyHash/sipHash) helpers + implementations. |
| docs/b_tree_family.md | New documentation page describing B-Tree/B+Tree and persistent variants. |
| cuckoo-filter.H | Adds a Cuckoo filter implementation with stash support and public API. |
| compact-quotient-filter.H | Adds a bit-packed (memory-optimized) quotient filter implementation. |
| compact-cuckoo-filter.H | Adds a bit-packed (memory-optimized) cuckoo filter implementation. |
| bloom-filter.H | Cleans up Bloom filter API/docs and switches default hash function pointer wrapper; adds [[nodiscard]] annotations. |
| al-matrix.H | Updates matrix entry hash functions to pointer-wrapper variants. |
| Tests/quotient-filter.cc | Adds comprehensive tests for Quotient_Filter behavior and properties. |
| Tests/hash_fct_test.cc | Updates JSW init usage and expands tests for new hash backends + ADL hashing concept. |
| Tests/hash-it.cc | Adjusts a test hash table case to use a constant-hash lambda. |
| Tests/file_bplus_map_test.cc | Adds tests for persistent B+ map: range, reopen, iterators, codecs. |
| Tests/file_b_map_test.cc | Adds tests for persistent B map: assign/find/bounds/reopen, codecs. |
| Tests/cuckoo-filter.cc | Adds tests for Cuckoo_Filter including probabilistic FP budgets. |
| Tests/compact-cuckoo-filter.cc | Adds extensive tests for Compact_Cuckoo_Filter, including statistical FP checks. |
| Tests/bplus_tree_test.cc | Adds comprehensive tests for BPlus_Tree correctness, iterators, randomized oracle checks. |
| Tests/b_tree_test.cc | Adds comprehensive tests for B_Tree correctness and randomized oracle checks. |
| Tests/CMakeLists.txt | Registers new test targets for trees, filters, and hash tooling. |
| README.md | Documents new trees/filters and expands overview + API tables. |
| README.es.md | Updates Spanish README tables to reflect probabilistic structures. |
| Examples/test-odhash.cc | Fixes example hashing for pair keys and silences unused insert return values. |
| Examples/set_structures_benchmark.cc | Changes default benchmark key count from 1M to 2M. |
| Examples/quotient_filter_example.C | Adds a full-feature quotient filter example (sections + CLI). |
| Examples/file_bplus_tree_example.cc | Adds an example for persistent B+ tree usage and sidecars. |
| Examples/file_bplus_map_example.cc | Adds an example for persistent B+ map usage and range iterator scans. |
| Examples/file_b_tree_example.cc | Adds an example for persistent B-tree usage and reopen behavior. |
| Examples/file_b_map_example.cc | Adds an example for persistent B-map usage and bounds queries. |
| Examples/cuckoo_filter_example.cc | Adds a basic Cuckoo filter example program. |
| Examples/compact_quotient_filter_example.C | Adds comprehensive examples for Compact_Quotient_Filter. |
| Examples/compact_cuckoo_filter_example.C | Adds comprehensive examples for Compact_Cuckoo_Filter. |
| Examples/bplus_tree_example.cc | Adds an example for in-memory B+ tree usage. |
| Examples/b_tree_example.cc | Adds an example for in-memory B-tree usage. |
| CMakeLists.txt | Registers new headers in the project header list. |
| .github/workflows/ci.yml | Excludes hash_statistical_test in CI/asan/tsan/coverage runs. |
| .clang-format | Replaces/updates clang-format configuration to project conventions (C++20, 2-space indent, brace style). |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Examples/testQueue.C (1)
183-195:⚠️ Potential issue | 🟠 MajorFix validation to reject values that exceed
INT_MAXsincecreate_queueexpectsint.Line 189 validates
n_long > SIZE_MAX, butnis then passed tocreate_queue(int)at lines 286 and 290. On 64-bit systems, values larger thanINT_MAXsilently truncate on conversion toint, causing the example to misbehave with oversized inputs.Change validation to:
- if (errno != 0 or *endptr1 != '\0' or n_long < 0 or n_long > SIZE_MAX) + if (errno != 0 or *endptr1 != '\0' or n_long < 0 or n_long > INT_MAX) { - cerr << "Error: Invalid queue size argument. Must be a non-negative integer fitting in size_t." << endl; + cerr << "Error: Invalid queue size argument. Must be a non-negative integer fitting in int." << endl; return 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Examples/testQueue.C` around lines 183 - 195, The validation for the parsed queue size should reject values larger than INT_MAX because create_queue(int) expects an int; update the checks around n_long (parsed via strtol) to ensure errno==0, *endptr1=='\0', n_long >= 0 and n_long <= INT_MAX (instead of SIZE_MAX), then cast to size_t/int for passing to create_queue; reference variables/functions: n_long, endptr1, n, argv[1], strtol, and create_queue to locate the change.
🧹 Nitpick comments (4)
tpl_file_b_map.H (1)
60-91: Consolidate the duplicated class-level Doxygen blocks.There are two adjacent
@briefclass descriptions forGen_File_B_Map; merging them into one block will reduce drift and keep docs easier to maintain.As per coding guidelines, "Keep documentation descriptions concise but complete".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 60 - 91, There are two adjacent class-level Doxygen blocks describing Gen_File_B_Map; merge them into a single consolidated Doxygen comment for the Gen_File_B_Map class by removing the duplicate block and combining the unique sentences (brief description, template parameter summaries, and warning) into one coherent `@brief/`@tparam set; update the remaining comment to include all necessary details (Key, Value, Compare, MinDegree, KeyCodec, ValueCodec, and the warning about fixed-size codecs) so only the unified Doxygen block remains above the Gen_File_B_Map declaration.tpl_b_tree.H (1)
721-725: Consider: The move-insert overload performs an extra copy.The current implementation moves into a local
copy, then callsinsert(copy)which copies again into the tree. For types where move is cheap but copy is expensive, this negates the benefit of the move overload.A more optimal approach would be to forward the key through the insertion path and only copy at the final storage point.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 721 - 725, The move overload insert(Key&& key) currently moves into a local Key copy and then calls insert(copy), causing an extra copy; change the insertion path so the forwarded rvalue is moved only once into final storage: introduce/use an insertion helper (e.g., insert_impl or the existing internal node-insert routine) that accepts Key&& and performs std::move when placing into the node, and have insert(Key&&) call that helper with std::move(key) instead of creating a local copy or routing through insert(const Key&). Update or add overloads so the core storage point (the node-store routine used by insert_impl) is the single place that copies/moves into the tree.Tests/jsw_init_test.cc (1)
24-38: Consider adding a comment about seed collision probability.The assertion
EXPECT_NE(h1, h3)at line 37 assumes different seeds produce different hashes for the same key. While this is overwhelmingly likely for "test_key" with seeds 12345 and 54321, it's not mathematically guaranteed. Adding a brief comment would clarify this is a probabilistic expectation, consistent with the test's deterministic nature for these specific values.Optional documentation improvement
init_jsw(54321); // Different seed size_t h3 = jsw_hash(key); + // Different seeds should produce different hashes for the same key + // (not mathematically guaranteed, but extremely likely for these specific values) EXPECT_NE(h1, h3); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/jsw_init_test.cc` around lines 24 - 38, Add a brief comment above the EXPECT_NE(h1, h3) assertion in the JSWInitTest::Reinitialization test clarifying that although different seeds (init_jsw(12345) vs init_jsw(54321)) are expected to produce different hashes for the same key, this is a probabilistic guarantee and not mathematically absolute; reference the specific symbols init_jsw and jsw_hash and note that the test uses fixed seeds and example key "test_key" where collision is extremely unlikely so the assertion is valid for this test.hash-fct.H (1)
595-618: Missinginlinekeyword on template function definitions.The template overloads for
xxhash64_hash,wyhash_hash, andsiphash24_hash(lines 595-618) are missing theinlinekeyword, unlike other template overloads in this file (e.g.,add_hash,xor_hashat lines 517, 523). While templates have implicit inline linkage, addinginlinewould be consistent with the rest of the file.Suggested fix for consistency
template <typename Key> requires std::is_trivially_copyable_v<Key> - size_t xxhash64_hash(const Key & key, + inline size_t xxhash64_hash(const Key & key, std::uint64_t seed = Default_Hash_Seed) noexcept { return xxhash64_hash(&key, sizeof(key), seed); } template <typename Key> requires std::is_trivially_copyable_v<Key> - size_t wyhash_hash(const Key & key, + inline size_t wyhash_hash(const Key & key, std::uint64_t seed = Default_Hash_Seed) noexcept { return wyhash_hash(&key, sizeof(key), seed); } template <typename Key> requires std::is_trivially_copyable_v<Key> - size_t siphash24_hash(const Key & key, + inline size_t siphash24_hash(const Key & key, std::uint64_t key0 = 0x0706050403020100ULL, std::uint64_t key1 = 0x0f0e0d0c0b0a0908ULL) noexcept { return siphash24_hash(&key, sizeof(key), key0, key1); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hash-fct.H` around lines 595 - 618, The three template overloads xxhash64_hash(const Key&), wyhash_hash(const Key&), and siphash24_hash(const Key&) are missing the inline keyword for consistency with other template overloads (e.g., add_hash, xor_hash); update each template definition to include inline before the template declaration so their linkage matches the rest of the file while keeping the existing signatures and calls to the pointer-based overloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Examples/graphpic.C`:
- Around line 1569-1570: The help text and the parser disagree: update the help
strings to accurately reflect what the parser accepts by changing the NODE help
to "NODE <id> <name> <x> <y>" (remove the optional [label]) and change the ARC
help to "ARC <src> <dst>" (remove the optional [weight]); this keeps the
documentation in sync with parse_node_definition() and parse_arc_definition()
without changing parsing logic.
In `@Examples/ntreepic.C`:
- Line 2187: The help string saying " -b Generate binary tree
layout" is misleading; update the option description to accurately reflect that
generate_bin_tree() emits a textual start-prefix / start-key representation
rather than an eepic layout—find the literal containing "-b" and replace its
text with a concise description such as " -b Emit textual
start-prefix/start-key representation of binary tree" referencing
generate_bin_tree() so users understand the actual output format.
In `@Examples/test-rvalues.C`:
- Around line 81-92: The count check (if (count != V) { ... abort(); })
currently runs before the seen-scan, making the "Falta el nodo ..." branch dead;
move the count validation to after the for loop that calls seen(i) (or remove
the count check entirely) so that missing-node diagnostics from the for-loop
using seen(i) can execute; ensure you reference and preserve variables count, V
and the for-loop scanning seen(i), and re-run tests for empty inputs,
single-element and duplicate cases to verify correctness.
In `@Examples/testAllTree.C`:
- Around line 225-231: The banner claims "exactly one" tree flag but
parser_opt() currently overwrites parsPtr->type, allowing multiple selectors;
fix by enforcing exclusivity in parser_opt(): when handling tree-selection
options (the cases that set parsPtr->type), check if parsPtr->type is already
set and if so return/emit a parse error (or call the existing usage/error path)
instead of silently overwriting; alternatively, if you prefer relaxed wording,
update the printed banner text in Examples/testAllTree.C to "one or more" or "at
least one" and leave parser_opt()/parsPtr->type behavior as-is—choose one
approach and keep parser_opt() and the banner consistent.
In `@net_utils.H`:
- Around line 394-427: The code assumes intermediate layers are non-empty when
repairing in-degrees (using reached and the later if (not layers[l].empty())
block), but it does not validate that interior layers are non-empty, causing
inputs like {1,0,1} to produce extra sources; add an upfront precondition check
in the function that builds/parses layers to reject any interior empty layer
(for each layer index l from 1 to layers.size()-2 ensure layers[l] is not empty)
and call ah_invalid_argument_if(layers[l].empty()) (or equivalent) when any
intermediate layer is empty so the function fails fast rather than silently
returning a graph that violates is_single_source(); update/add a regression test
for the {1,0,1} case.
In `@Tests/compact-cuckoo-filter.cc`:
- Around line 278-293: The test
CompactCuckooFilterCapacity::high_load_factor_still_works constructs
Compact_Cuckoo_Filter<int> cf(1000) with the default RNG making the outcome
flaky; change the construction to pass a fixed deterministic seed (e.g.
Compact_Cuckoo_Filter<int> cf(1000, fixed_seed)) so inserts that trigger kicks
are repeatable, and apply the same pattern to other high-load/stress tests that
create Compact_Cuckoo_Filter (use the constructor overload that accepts a seed
or equivalent RNG parameter to ensure deterministic behavior).
In `@tpl_file_b_map.H`:
- Around line 235-639: Add explicit time/space complexity and thread-safety
guarantees to the public API docs for the Gen_File_B_Map class and its public
methods (e.g., constructors Gen_File_B_Map(...), file_path(), open_mode(),
is_read_only(), auto_sync_enabled(), set_auto_sync(), key_comp(), empty(),
size(), height(), page_size_bytes(), page_count(), checkpoint_sequence(),
sync(), checkpoint(), reload(), clear(), contains()/search(), find(), at(),
insert(), insert_or_assign(), remove(), min_item(), max_item(), lower_bound(),
upper_bound(), keys(), values(), items(), verify()). For each method add a short
`@note` stating algorithmic complexity (time and where applicable space) and a
one-line thread-safety guarantee (for example: "Thread-safety: const observers
are safe for concurrent access; mutating methods require external
synchronization; opening in Read_Only allows concurrent readers", or specify any
finer guarantees), keeping wording consistent across methods and matching
existing exception/docs style; place these notes in the corresponding Doxygen
comment blocks immediately after the `@throws/`@return sections. Ensure
constructors document concurrency implications for the opened handle and that
methods which perform I/O (sync/reload/clear/insert/insert_or_assign/remove)
explicitly state whether they are atomic/durable or require external sync calls.
- Around line 516-527: The current insert_or_assign() removes the old record via
tree_.remove(probe) before tree_.insert(record) so if insert throws the old
entry is lost; modify insert_or_assign() to be exception-safe by either
inserting the new record first (and only removing the old on success) or by
saving the removed record (old_record) and, if tree_.insert(record) throws,
reinserting old_record in the catch before restoring auto_sync via
set_auto_sync(was_sync); reference the functions tree_.remove, tree_.insert,
insert_or_assign, set_auto_sync, was_sync, record and probe to implement the
rollback/rescue path so the map state is unchanged on failure.
---
Outside diff comments:
In `@Examples/testQueue.C`:
- Around line 183-195: The validation for the parsed queue size should reject
values larger than INT_MAX because create_queue(int) expects an int; update the
checks around n_long (parsed via strtol) to ensure errno==0, *endptr1=='\0',
n_long >= 0 and n_long <= INT_MAX (instead of SIZE_MAX), then cast to size_t/int
for passing to create_queue; reference variables/functions: n_long, endptr1, n,
argv[1], strtol, and create_queue to locate the change.
---
Nitpick comments:
In `@hash-fct.H`:
- Around line 595-618: The three template overloads xxhash64_hash(const Key&),
wyhash_hash(const Key&), and siphash24_hash(const Key&) are missing the inline
keyword for consistency with other template overloads (e.g., add_hash,
xor_hash); update each template definition to include inline before the template
declaration so their linkage matches the rest of the file while keeping the
existing signatures and calls to the pointer-based overloads.
In `@Tests/jsw_init_test.cc`:
- Around line 24-38: Add a brief comment above the EXPECT_NE(h1, h3) assertion
in the JSWInitTest::Reinitialization test clarifying that although different
seeds (init_jsw(12345) vs init_jsw(54321)) are expected to produce different
hashes for the same key, this is a probabilistic guarantee and not
mathematically absolute; reference the specific symbols init_jsw and jsw_hash
and note that the test uses fixed seeds and example key "test_key" where
collision is extremely unlikely so the assertion is valid for this test.
In `@tpl_b_tree.H`:
- Around line 721-725: The move overload insert(Key&& key) currently moves into
a local Key copy and then calls insert(copy), causing an extra copy; change the
insertion path so the forwarded rvalue is moved only once into final storage:
introduce/use an insertion helper (e.g., insert_impl or the existing internal
node-insert routine) that accepts Key&& and performs std::move when placing into
the node, and have insert(Key&&) call that helper with std::move(key) instead of
creating a local copy or routing through insert(const Key&). Update or add
overloads so the core storage point (the node-store routine used by insert_impl)
is the single place that copies/moves into the tree.
In `@tpl_file_b_map.H`:
- Around line 60-91: There are two adjacent class-level Doxygen blocks
describing Gen_File_B_Map; merge them into a single consolidated Doxygen comment
for the Gen_File_B_Map class by removing the duplicate block and combining the
unique sentences (brief description, template parameter summaries, and warning)
into one coherent `@brief/`@tparam set; update the remaining comment to include
all necessary details (Key, Value, Compare, MinDegree, KeyCodec, ValueCodec, and
the warning about fixed-size codecs) so only the unified Doxygen block remains
above the Gen_File_B_Map declaration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 57edcb2e-157c-4cee-862d-5fdf97b173c8
📒 Files selected for processing (22)
Examples/btreepic.CExamples/graphpic.CExamples/hungarian_example.ccExamples/ntreepic.CExamples/test-comb.CExamples/test-con-queue.CExamples/test-rvalues.CExamples/testAllTree.CExamples/testQueue.CExamples/timeAllTree.CTests/compact-cuckoo-filter.ccTests/cuckoo-filter.ccTests/hash_restriction_test.ccTests/jsw_init_test.cchash-fct.Chash-fct.Hnet_utils.Hscripts/check_examples_runtpl_b_tree.Htpl_bplus_tree.Htpl_file_b_map.Htpl_file_b_tree.H
🚧 Files skipped from review as they are similar to previous changes (1)
- Tests/cuckoo-filter.cc
Refactor test utilities to use a per-process random generator for temporary file paths, avoiding collisions. Update file B-map and B+-map constructors for `auto_sync` and `open_mode` parameters. Add thread-safety notes to File_B_Map documentation.
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (3)
Tests/compact-cuckoo-filter.cc (1)
55-57:⚠️ Potential issue | 🟠 MajorUse a fixed seed consistently across the suite.
Only part of the tests pin RNG state today. For repeatable CI behavior, use the seeded constructor consistently, especially where insert paths may trigger kicks.
Based on learnings: Tests must be deterministic and repeatable.
Also applies to: 96-97, 109-110, 126-127, 183-184, 199-200, 209-210, 226-227, 239-240, 265-266, 297-298, 326-327, 342-343, 356-357, 374-375, 388-389, 410-410, 431-431, 520-523, 567-570, 610-615
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/compact-cuckoo-filter.cc` around lines 55 - 57, The test uses an unseeded Compact_Cuckoo_Filter<int> instance which makes insert/kick behavior non-deterministic; update the construction of Compact_Cuckoo_Filter<int> in the CompactCuckooFilterCtor::default_construction test (and the other listed tests) to use the seeded constructor variant with a shared constant seed (e.g. kTestSeed) so all tests use the same fixed RNG state; search for occurrences of Compact_Cuckoo_Filter<int> cf(1000) (and similar two-argument constructions) and replace them to pass the fixed seed parameter to the filter constructor so CI is repeatable.tpl_file_b_map.H (2)
541-552:⚠️ Potential issue | 🔴 CriticalMust fix: rollback reinserts a probe record, not the previous key/value record.
At Line 541,
old_recordis copied fromprobe, which only has the encoded key. If Line 546/547 fails and Line 551 runs, the rollback can restore the key with an invalid/default mapped value instead of the original value.💡 Proposed fix
bool insert_or_assign(const Key & key, const Value & value) { const auto probe = make_probe(key); const auto record = make_record(key, value); if (tree_.insert(record)) return true; + const auto previous = find_record(key); + ah_runtime_error_unless(previous.has_value()) + << "File_B_Map::insert_or_assign(): expected existing key before replace"; + // Suppress intermediate syncs so the remove+insert pair is written as a // single WAL-flushed unit; restore and sync once at the end. const bool was_sync = auto_sync_enabled(); set_auto_sync(false); try { - auto old_record = probe; + const auto old_record = *previous; ah_runtime_error_unless(tree_.remove(probe)) << "File_B_Map::insert_or_assign(): internal replace failed"; try { ah_runtime_error_unless(tree_.insert(record)) << "File_B_Map::insert_or_assign(): internal reinsert failed"; } catch (...) { - tree_.insert(old_record); // rollback: restore the removed entry + ah_runtime_error_unless(tree_.insert(old_record)) + << "File_B_Map::insert_or_assign(): rollback failed"; throw; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 541 - 552, The rollback currently stores only the probe (key-only) in old_record, so File_B_Map::insert_or_assign must capture the full existing record (key+value) before calling tree_.remove; replace the initialization of old_record from probe with a copy of the actual record retrieved from the tree (e.g., via the lookup/find that produced probe or the entry returned by remove) so that on failure tree_.insert(old_record) restores the original mapped value as well as the key; ensure you reference the same record type used by tree_.insert/remove and keep the try/catch rollback logic unchanged.
231-237:⚠️ Potential issue | 🟡 MinorShould fix: constructor docs are missing complexity/thread-safety notes here.
The constructor at Line 231-237 is the only constructor overload in this cluster without explicit complexity/thread-safety
@note, which makes the public API docs inconsistent.As per coding guidelines, "All public member functions must have
@brief,@param,@return, and@throwsDoxygen tags", "Include time/space complexity in function documentation as@noteor in description", and "Document thread-safety guarantees in class/function comments".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 231 - 237, Add missing `@note` entries to the constructor documentation that currently reads "Open or create a persistent B-Tree map..." — include time and space complexity (e.g., average/worst-case for insert/lookup and space overhead for page cache) and a thread-safety statement (e.g., whether instances are thread-safe for concurrent readers, writers, or require external synchronization). Update the constructor's Doxygen block (the constructor symbol in this header) to contain these `@note` details alongside existing `@brief/`@param/@throws so it matches other overloads' documentation.
🧹 Nitpick comments (4)
Examples/timeAllTree.C (1)
560-595: Avoid hard-coding option help text separately fromoptions[].The manual list can drift from
argp_optiondefinitions over time. Consider delegating toargp_help(or generating fromoptions[]) for a single source of truth.As per coding guidelines: "Refactor duplicated code into reusable functions or templates."Possible simplification
int main(int argc, char *argv[]) { if (argc == 1) { - cout << "timeAllTree -- benchmark Aleph-w binary search tree implementations\n" - << "\n" - << "Grows a tree to n keys (sampling at each power of 2) and reports\n" - << "height, internal path length (IPL), and insertion/removal timing\n" - << "statistics (min / avg / median / sigma / max).\n" - << "\n" - << "Usage:\n" - << " " << argv[0] << " <tree-flags> [-n num_nodes] [-m seed]\n" - << "\n" - << "Tree type flags (at least one required, or use --all):\n" - << " -b, --bin Unbalanced BST (BinTree)\n" - << " -a, --avl AVL tree\n" - << " --avlrk AVL tree (ranked)\n" - << " -s, --splay Splay tree\n" - << " --splayrk Splay tree (ranked)\n" - << " -r, --redblack Red-black tree\n" - << " --redblackrk Red-black tree (ranked)\n" - << " --tdrb Top-down red-black tree\n" - << " --tdrbrk Top-down red-black tree (ranked)\n" - << " -p, --treap Treap\n" - << " --treaprk Treap (ranked)\n" - << " -d, --rand Randomized tree\n" - << " -l, --all Benchmark all tree types\n" - << "\n" - << "Options:\n" - << " -n num_nodes Number of keys to insert (default: 1000)\n" - << " -m seed RNG seed (default: current time)\n" - << "\n" - << "Examples:\n" - << " " << argv[0] << " --all -n 10000 -m 42\n" - << " " << argv[0] << " -a -r -n 50000\n" - << " " << argv[0] << " --avlrk --treaprk -n 20000 -m 456\n"; + argp_help(&argDefs, stdout, ARGP_HELP_STD_USAGE | ARGP_HELP_LONG, argv[0]); return 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Examples/timeAllTree.C` around lines 560 - 595, The long manual help string printed when argc == 1 is duplicated with the argp_option definitions in options[]; replace this hard-coded block in the main/usage path with a single call to argp_help (or a small helper that generates help from options[]) so the options[] array remains the single source of truth: remove the cout help block and invoke argp_help (&argp, stderr, ARGP_HELP_STD_USAGE | ARGP_HELP_LONG, program_name) or equivalent wrapper that references options[]; update any program-specific example lines via argp_program_version/argp_program_bug_address if needed so examples remain available from the centralized help generation.Tests/compact-cuckoo-filter.cc (1)
43-46: Prefer Aleph containers overstd::set/std::vectorin this codebase.Consider migrating these to Aleph equivalents for policy consistency in C++ sources.
As per coding guidelines: MUST use Aleph containers when an equivalent exists; standard containers are only permitted when no Aleph equivalent is available.
Also applies to: 282-282, 455-456, 572-572
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tests/compact-cuckoo-filter.cc` around lines 43 - 46, Replace uses of std::set and std::vector with the Aleph equivalents and update includes: remove `#include` <set> and `#include` <vector>, add the corresponding Aleph container headers (e.g., Aleph vector/set headers used in this codebase), and change all type occurrences of std::set and std::vector in this file (including the other instances noted around lines 282, 455, 572) to the Aleph container types (e.g., aleph::vector / aleph::set or the project’s Aleph namespace/type names), keeping other headers like <random> and <algorithm> as-is; ensure any construction, iteration, and algorithms compile with the Aleph API (adjust calls/iterators if necessary).tpl_b_tree.H (1)
147-160: Nice to have: use logarithmic node-local search for bounds indices.
lower_bound_index()andupper_bound_index()currently scan linearly through node keys. For higherMinDegree, this makes hot-path comparisons more expensive than necessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 147 - 160, The current linear scans in lower_bound_index and upper_bound_index are O(n) per node; replace them with node-local binary searches to achieve O(log n): for lower_bound_index implement a binary-search using size_t lo=0, hi=keys.size() and while (lo<hi) mid=(lo+hi)/2 then if cmp_(keys[mid], key) lo=mid+1 else hi=mid; return lo; for upper_bound_index use the variant that advances on comparison or equality (i.e., while (lo<hi) mid=(lo+hi)/2 then if (cmp_(keys[mid], key) or equals(keys[mid], key)) lo=mid+1 else hi=mid; return lo); keep using cmp_, equals, and Array<Key> so behavior and signatures remain unchanged.scripts/check_examples_run (1)
112-125: Stream configure/build logs directly to disk instead of buffering them in memory.
run_commandaccumulates the full CMake output and only writes it afterward. Tailing straight into the log file keeps memory bounded on noisy builds.As per coding guidelines, "Prioritize: correctness (UB, data races, lifetime), security, regressions, then performance."📉 Suggested refactor
-run_command = lambda do |cmd, chdir: repo_root| - output = +'' +run_command = lambda do |cmd, chdir: repo_root, log_path: nil| status = nil - Open3.popen2e(*cmd, chdir: chdir) do |_stdin, out, wait_thr| - out.each_line do |line| - output << line - print line if options[:verbose] + File.open(log_path, 'w') do |log| + Open3.popen2e(*cmd, chdir: chdir) do |_stdin, out, wait_thr| + out.each_line do |line| + log.write(line) + print line if options[:verbose] + end + status = wait_thr.value end - status = wait_thr.value end - [output, status] + status end ... - output, status = run_command.call(configure_cmd) - File.write(File.join(log_dir, 'configure.log'), output) + status = run_command.call(configure_cmd, log_path: File.join(log_dir, 'configure.log')) ... - output, status = run_command.call(build_cmd) - File.write(File.join(log_dir, 'build.log'), output) + status = run_command.call(build_cmd, log_path: File.join(log_dir, 'build.log'))Also applies to: 137-151
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_examples_run` around lines 112 - 125, The run_command function currently buffers the entire subprocess output in memory; change it to stream lines directly to a log file: add a log_file (or log_path) parameter, open the file with File.open(..., "a") before calling Open3.popen2e, and inside out.each_line write each line to the log file (and flush), while still printing when options[:verbose]; avoid accumulating a growing output string (or limit it if you must return a snippet) and still capture and return the process status. Apply the same streaming change to the related helper around lines 137-151 to ensure both configure/build invocations write directly to disk rather than buffering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Examples/timeAllTree.C`:
- Around line 560-595: Add a regression CLI test for the new no-arg behavior:
run the built timeAllTree executable with argc==1 (no argv arguments) and assert
it exits with code 0 and that stdout contains the help header string
"timeAllTree -- benchmark Aleph-w binary search tree implementations" (or, if
testing harness can't run binaries, add a unit/integration test that invokes
main-like entrypoint and checks the same). If adding a test is impossible, add a
short justification in the PR describing why tests are not applicable. Use the
program entrypoint (main / argc/argv) and the help text string to locate the
behavior to verify.
In `@scripts/check_examples_run`:
- Around line 24-80: Add a thin integration smoke test that exercises the new
CLI control flow: invoke the script (the OptionParser.new block) with flags for
--only, --exclude, and --timeout and assert that filtering is applied (examples
listed in options[:only]/options[:exclude] are honored) and that a per-example
timeout is reported when a run exceeds options[:timeout]; the test should cover
combinations like --only with --exclude and the --verbose flag to ensure output
paths (options[:log_dir]) are used, and it should fail fast if behavior diverges
so future changes to the OptionParser handling (the --only, --exclude,
--timeout, --log-dir, --verbose handlers) are caught.
- Around line 99-105: The script currently silently ignores typos by
intersecting options[:only] and options[:exclude] with source_basenames;
instead, validate the user-supplied names first: compute unknown_only =
options[:only] - source_basenames and unknown_exclude = options[:exclude] -
source_basenames, and if either is non-empty emit a clear error listing the
unknown names and exit non-zero; after that proceed with selected =
source_basenames & options[:only] (and subtract options[:exclude]) as before.
Ensure the checks reference the existing variables selected, source_basenames,
options[:only], and options[:exclude].
- Around line 84-110: When honoring options[:clean] before calling
FileUtils.rm_rf(build_dir), validate that the resolved build_dir is not root,
the repo_root, or any ancestor of repo_root; use Pathname.realpath on build_dir
and repo_root and abort (reject --clean) if build_path == Pathname.new('/') or
build_path == repo_root_path or repo_root_path.to_s.start_with?(build_path.to_s
+ File::SEPARATOR). Update the code around the FileUtils.rm_rf(build_dir) call
to perform this check and exit with a clear error when the guard fails.
- Around line 127-148: The script currently assumes single-config CMake
generators; update the configure/build and binary lookup to handle multi-config
generators by reading CMakeCache.txt for CMAKE_CONFIGURATION_TYPES (use
File.read and /CMAKE_CONFIGURATION_TYPES/ or similar) and treating it as
multi-config when present; when building (where build_cmd is constructed) append
"--config #{options[:build_type]}" to build_cmd if multi-config; when locating
executables (where you currently search under File.join(build_dir, 'Examples',
name)) also check the config-specific path(s) such as File.join(build_dir,
'Examples', options[:build_type], name) and/or other config subdirs reported by
CMAKE_CONFIGURATION_TYPES so examples in Debug/Release folders are found;
preserve existing behavior for single-config generators.
In `@Tests/file_b_map_test.cc`:
- Around line 84-167: Add a regression test that triggers the exceptional
rollback path of File_B_Map::insert_or_assign by using a value type or codec
that throws during the reinsert/replace step, call insert_or_assign on a
pre-existing key and assert the call propagates the exception, then verify the
map state is unchanged using File_B_Map::find, File_B_Map::at, File_B_Map::items
(and File_B_Map::verify) to ensure the original value remains and no partial
updates occurred; place this alongside existing tests (e.g. near
SupportsBoundedStringCodecs) to cover the failure path for insert_or_assign.
In `@tpl_b_tree.H`:
- Around line 537-852: The public methods lack per-method Doxygen entries for
time/space complexity and thread-safety; update each public method's docblock
(e.g., Gen_B_Tree constructors, Gen_B_Tree(const Gen_B_Tree&), operator=,
key_comp, get_compare, empty, is_empty, size, height, clear, contains, search,
insert(const Key&), insert(Key&&), remove, min_key, max_key, lower_bound,
upper_bound, keys, verify) to include a `@note` (or inline text) that states the
time complexity and any relevant space complexity and a short thread-safety
guarantee (for example: "Time: O(log n) amortized / Space: O(1) extra" or
"Thread-safety: concurrent const methods safe; modifying methods are not
thread-safe"). Ensure each method retains existing `@brief/`@param/@return/@throws
tags and add these complexity and thread-safety notes in the same comment block.
---
Duplicate comments:
In `@Tests/compact-cuckoo-filter.cc`:
- Around line 55-57: The test uses an unseeded Compact_Cuckoo_Filter<int>
instance which makes insert/kick behavior non-deterministic; update the
construction of Compact_Cuckoo_Filter<int> in the
CompactCuckooFilterCtor::default_construction test (and the other listed tests)
to use the seeded constructor variant with a shared constant seed (e.g.
kTestSeed) so all tests use the same fixed RNG state; search for occurrences of
Compact_Cuckoo_Filter<int> cf(1000) (and similar two-argument constructions) and
replace them to pass the fixed seed parameter to the filter constructor so CI is
repeatable.
In `@tpl_file_b_map.H`:
- Around line 541-552: The rollback currently stores only the probe (key-only)
in old_record, so File_B_Map::insert_or_assign must capture the full existing
record (key+value) before calling tree_.remove; replace the initialization of
old_record from probe with a copy of the actual record retrieved from the tree
(e.g., via the lookup/find that produced probe or the entry returned by remove)
so that on failure tree_.insert(old_record) restores the original mapped value
as well as the key; ensure you reference the same record type used by
tree_.insert/remove and keep the try/catch rollback logic unchanged.
- Around line 231-237: Add missing `@note` entries to the constructor
documentation that currently reads "Open or create a persistent B-Tree map..." —
include time and space complexity (e.g., average/worst-case for insert/lookup
and space overhead for page cache) and a thread-safety statement (e.g., whether
instances are thread-safe for concurrent readers, writers, or require external
synchronization). Update the constructor's Doxygen block (the constructor symbol
in this header) to contain these `@note` details alongside existing
`@brief/`@param/@throws so it matches other overloads' documentation.
---
Nitpick comments:
In `@Examples/timeAllTree.C`:
- Around line 560-595: The long manual help string printed when argc == 1 is
duplicated with the argp_option definitions in options[]; replace this
hard-coded block in the main/usage path with a single call to argp_help (or a
small helper that generates help from options[]) so the options[] array remains
the single source of truth: remove the cout help block and invoke argp_help
(&argp, stderr, ARGP_HELP_STD_USAGE | ARGP_HELP_LONG, program_name) or
equivalent wrapper that references options[]; update any program-specific
example lines via argp_program_version/argp_program_bug_address if needed so
examples remain available from the centralized help generation.
In `@scripts/check_examples_run`:
- Around line 112-125: The run_command function currently buffers the entire
subprocess output in memory; change it to stream lines directly to a log file:
add a log_file (or log_path) parameter, open the file with File.open(..., "a")
before calling Open3.popen2e, and inside out.each_line write each line to the
log file (and flush), while still printing when options[:verbose]; avoid
accumulating a growing output string (or limit it if you must return a snippet)
and still capture and return the process status. Apply the same streaming change
to the related helper around lines 137-151 to ensure both configure/build
invocations write directly to disk rather than buffering.
In `@Tests/compact-cuckoo-filter.cc`:
- Around line 43-46: Replace uses of std::set and std::vector with the Aleph
equivalents and update includes: remove `#include` <set> and `#include` <vector>,
add the corresponding Aleph container headers (e.g., Aleph vector/set headers
used in this codebase), and change all type occurrences of std::set and
std::vector in this file (including the other instances noted around lines 282,
455, 572) to the Aleph container types (e.g., aleph::vector / aleph::set or the
project’s Aleph namespace/type names), keeping other headers like <random> and
<algorithm> as-is; ensure any construction, iteration, and algorithms compile
with the Aleph API (adjust calls/iterators if necessary).
In `@tpl_b_tree.H`:
- Around line 147-160: The current linear scans in lower_bound_index and
upper_bound_index are O(n) per node; replace them with node-local binary
searches to achieve O(log n): for lower_bound_index implement a binary-search
using size_t lo=0, hi=keys.size() and while (lo<hi) mid=(lo+hi)/2 then if
cmp_(keys[mid], key) lo=mid+1 else hi=mid; return lo; for upper_bound_index use
the variant that advances on comparison or equality (i.e., while (lo<hi)
mid=(lo+hi)/2 then if (cmp_(keys[mid], key) or equals(keys[mid], key)) lo=mid+1
else hi=mid; return lo); keep using cmp_, equals, and Array<Key> so behavior and
signatures remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 98641ca1-089d-4b6d-886a-89d985ae0ea9
📒 Files selected for processing (25)
Examples/btreepic.CExamples/graphpic.CExamples/hungarian_example.ccExamples/ntreepic.CExamples/test-comb.CExamples/test-con-queue.CExamples/test-rvalues.CExamples/testAllTree.CExamples/testQueue.CExamples/timeAllTree.CTests/compact-cuckoo-filter.ccTests/cuckoo-filter.ccTests/file_b_map_test.ccTests/file_bplus_map_test.ccTests/hash_restriction_test.ccTests/jsw_init_test.ccTests/net_utils_test.cchash-fct.Chash-fct.Hnet_utils.Hscripts/check_examples_runtpl_b_tree.Htpl_bplus_tree.Htpl_file_b_map.Htpl_file_b_tree.H
🚧 Files skipped from review as they are similar to previous changes (7)
- Tests/file_bplus_map_test.cc
- net_utils.H
- Examples/test-rvalues.C
- Examples/testAllTree.C
- Examples/test-con-queue.C
- Tests/cuckoo-filter.cc
- tpl_bplus_tree.H
Add `O(N)` complexity notes to `tpl_b_tree.H` and `tpl_file_b_map.H` and enhance the rollback logic in `File_B_Map::insert_or_assign()`. Also, make the `check_examples_run` script safer.
There was a problem hiding this comment.
Pull request overview
This PR adds new multiway search tree containers (in-memory and persistent B-Tree / B+ Tree families) and expands the library’s probabilistic filter offerings, alongside docs, examples, tests, and a few supporting infrastructure tweaks.
Changes:
- Introduces B-Tree/B+Tree containers (including file-backed map variants) plus new probabilistic filters (Cuckoo / Quotient and compact variants).
- Adds/updates docs and examples to demonstrate usage and persistence/recovery behavior.
- Expands and adjusts the test suite + CI configuration (including filtering a heavy statistical hash test).
Reviewed changes
Copilot reviewed 62 out of 72 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tpl_dynSetHash.H | Updates default hash function pointers to the new pointer-wrapper defaults. |
| tpl_dynMapOhash.H | Updates default hash function pointers for the open-hash map. |
| tpl_dynLhash.H | Updates default hash function pointer used by DynLhashTable. |
| scripts/check_examples_run | Adds a Ruby helper to configure/build/run examples with logging and timeouts. |
| net_utils.H | Strengthens layered network generation invariants to preserve single-source semantics. |
| docs/b_tree_family.md | New documentation for B-Tree/B+Tree family APIs and file-backed behavior. |
| cuckoo-filter.H | Adds a new Cuckoo filter implementation with optional deterministic kick RNG seed. |
| compact-quotient-filter.H | Adds a bit-packed quotient filter variant focused on memory reduction. |
| compact-cuckoo-filter.H | Adds a bit-packed Cuckoo filter variant focused on memory reduction. |
| bloom-filter.H | Modernizes the Bloom filter API/docs and adjusts hashing defaults. |
| al-matrix.H | Updates pair-hash defaults to pointer-wrapper variants. |
| Tests/quotient-filter.cc | Adds comprehensive Quotient_Filter tests. |
| Tests/net_utils_test.cc | Adds regression test for rejecting empty intermediate layers. |
| Tests/jsw_init_test.cc | Adds tests for lazy jsw initialization/reinitialization behavior. |
| Tests/hash_restriction_test.cc | Adds compile-time trait checks for hashing restrictions/overloads. |
| Tests/hash_fct_test.cc | Extends hash consistency/quality tests and seeds jsw deterministically. |
| Tests/hash-it.cc | Fixes hash-table test to use a deterministic constant hash via callable wrapper. |
| Tests/file_bplus_map_test.cc | Adds tests for File_BPlus_Map range/iterators/reopen and codecs. |
| Tests/file_b_map_test.cc | Adds tests for File_B_Map CRUD/bounds/reopen and codecs. |
| Tests/cuckoo-filter.cc | Adds basic Cuckoo_Filter tests and probabilistic assertions. |
| Tests/compact-cuckoo-filter.cc | Adds comprehensive Compact_Cuckoo_Filter tests and comparisons. |
| Tests/bplus_tree_test.cc | Adds BPlus_Tree tests including lazy iterators and randomized oracle checks. |
| Tests/b_tree_test.cc | Adds B_Tree tests including randomized oracle checks. |
| Tests/CMakeLists.txt | Registers new test executables in CMake. |
| README.md | Documents new tree/filter families and expands usage guidance. |
| README.es.md | Updates Spanish README stream algorithms table and probabilistic section. |
| Examples/timeAllTree.C | Adds a no-args help/usage banner for the benchmark. |
| Examples/testQueue.C | Adds help banner and tightens argument validation to int range. |
| Examples/testAllTree.C | Adds a no-args help/usage banner. |
| Examples/test-rvalues.C | Reworks graph consistency checks (node uniqueness/completeness). |
| Examples/test-odhash.cc | Fixes hashing for pair keys and suppresses unused insert results safely. |
| Examples/test-con-queue.C | Adds a no-args help/usage banner. |
| Examples/test-comb.C | Adjusts example list length to match expected combination behavior. |
| Examples/set_structures_benchmark.cc | Changes default benchmark size from 1M to 2M keys. |
| Examples/quotient_filter_example.C | Adds a detailed quotient filter example with selectable sections. |
| Examples/ntreepic.C | Adds a no-args help/usage banner. |
| Examples/graphpic.C | Adds a no-args help/usage banner. |
| Examples/file_bplus_tree_example.cc | Adds a persistent File_BPlus_Tree example including sidecar cleanup. |
| Examples/file_bplus_map_example.cc | Adds a persistent File_BPlus_Map example including read-only reopen. |
| Examples/file_b_tree_example.cc | Adds a persistent File_B_Tree example including sidecar cleanup. |
| Examples/file_b_map_example.cc | Adds a persistent File_B_Map example including read-only reopen. |
| Examples/cuckoo_filter_example.cc | Adds a small Cuckoo filter usage example. |
| Examples/compact_quotient_filter_example.C | Adds comprehensive Compact_Quotient_Filter examples/benchmarks. |
| Examples/compact_cuckoo_filter_example.C | Adds comprehensive Compact_Cuckoo_Filter examples/benchmarks. |
| Examples/btreepic.C | Adds a no-args help/usage banner. |
| Examples/bplus_tree_example.cc | Adds an in-memory BPlus_Tree example. |
| Examples/b_tree_example.cc | Adds an in-memory B_Tree example. |
| CMakeLists.txt | Exports new public headers for the new tree family. |
| .github/workflows/ci.yml | Excludes a heavy statistical hash test from CI/coverage/asan/tsan runs. |
| .clang-format | Replaces CLion-generated config with a project-specific C++20 style. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
scripts/check_examples_run (2)
25-81:⚠️ Potential issue | 🟡 MinorShould fix: add a smoke test for the CLI option-control flow.
This script introduces multiple behavior branches (
--only,--exclude,--timeout,--log-dir,--verbose) but no regression test is shown for them. A thin integration smoke test would prevent silent drift.If you want, I can draft a compact test script that validates filter handling and timeout reporting end-to-end.
As per coding guidelines, "If the PR changes behavior, require tests or explain why they don't apply."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_examples_run` around lines 25 - 81, Add a compact integration smoke test that invokes the CLI (using OptionParser.new) with combinations of flags and asserts the script honors options[:only], options[:exclude], options[:timeout], options[:log_dir], and options[:verbose]; specifically, run the script with a small set of example binaries, verify that options[:only] runs only selected examples and options[:exclude] skips them, that a too-short options[:timeout] causes a timeout reporting path to be exercised, that logs are written into the given options[:log_dir], and that verbose mode streams output instead of only saving logs. Implement the test as a small executable test script (or RSpec/mini-test case) that spawns the script, captures exit status and stdout/stderr, checks presence/absence of example run markers and log files, and cleans up artifacts to prevent flakiness.
150-170:⚠️ Potential issue | 🟠 MajorMust fix: handle multi-config CMake generators when building and locating executables.
Line 155 sets
CMAKE_BUILD_TYPE, but on multi-config generators that is not the config selector. Line 170 also misses--config, and Line 186 only checks a single-config output path, so built executables can be falsely reported as missing.💡 Proposed patch
@@ require 'optparse' require 'pathname' +require 'rbconfig' require 'timeout' @@ if options[:configure] || !File.exist?(File.join(build_dir, 'CMakeCache.txt')) @@ end + +cache_file = File.join(build_dir, 'CMakeCache.txt') +multi_config = File.exist?(cache_file) && + File.read(cache_file).match?(/^CMAKE_CONFIGURATION_TYPES:/) @@ if options[:rebuild] build_cmd = ['cmake', '--build', build_dir, '--target', *selected] + build_cmd += ['--config', options[:build_type]] if multi_config build_cmd += ['--parallel', options[:jobs].to_s] if options[:jobs] @@ selected.each do |name| - exe = File.join(examples_build_dir, name) + exeext = RbConfig::CONFIG['EXEEXT'] + candidates = [File.join(examples_build_dir, name)] + candidates << File.join(examples_build_dir, options[:build_type], name) if multi_config + candidates = candidates.flat_map { |p| [p, "#{p}#{exeext}"] }.uniq + exe = candidates.find { |p| File.exist?(p) } @@ - unless File.exist?(exe) + unless exe#!/bin/bash # Verify current script behavior around multi-config support. set -euo pipefail rg -n "CMAKE_BUILD_TYPE|--build|--config|CMAKE_CONFIGURATION_TYPES|File.join\\(examples_build_dir" scripts/check_examples_run sed -n '150,205p' scripts/check_examples_runAs per coding guidelines, "Prioritize: correctness (UB, data races, lifetime), security, regressions, then performance."
Also applies to: 186-193
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_examples_run` around lines 150 - 170, The script currently forces a single-config CMake flow by setting CMAKE_BUILD_TYPE in configure_cmd and by calling cmake --build without a --config, and later only checks single-config output paths (affecting configure_cmd, build_cmd and the executable lookup around the examples_build_dir checks). Update configure logic to set CMAKE_BUILD_TYPE only for single-config generators (or omit it when a multi-config generator is detected via CMAKE_GENERATOR or user options), add --config #{options[:build_type]} to build_cmd when options[:build_type] is provided (or when using a multi-config generator), and modify the executable discovery code (the checks around examples_build_dir lines ~186-193) to look for artifacts in both single-config locations and multi-config subdirectories (e.g., Debug/Release) or use the --config-built output path; ensure run_command usage and logging (run_command.call and File.write to configure.log) remain consistent.
🧹 Nitpick comments (2)
tpl_file_b_map.H (2)
195-203: Minor:reserve(1)for empty input is unnecessary.The ternary
records.size() == 0 ? 1 : records.size()reserves 1 element when the input is empty, which is harmless but slightly inconsistent. This pattern is repeated inkeys()andvalues(). Consider simplifying to justreserve(records.size())since reserving 0 is valid and more idiomatic.♻️ Suggested simplification
- out.reserve(records.size() == 0 ? 1 : records.size()); + out.reserve(records.size());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 195 - 203, The reserve call in to_pairs currently uses a ternary to reserve 1 when records is empty; change it to reserve(records.size()) to allow reserving 0 (valid and more idiomatic). Update the same pattern in the keys() and values() functions to call reserve(records.size()) instead of the ternary expression so all three functions (to_pairs, keys, values) consistently reserve exactly records.size().
231-241: Missing@notecomplexity in first constructor.The first constructor overload (lines 231-241) lacks the
@note O(page_count)complexity annotation that the other constructors have. For consistency with the API documentation style used elsewhere in this file, consider adding it.📝 Add complexity note
* `@param` cmp Comparison functor used to order keys. * `@throws` std::runtime_error If the backing file cannot be used. * `@throws` std::bad_alloc If the page cache cannot be allocated. + * `@note` O(page_count) I/O to read the file header and root page. */ explicit Gen_File_B_Map(std::string file_path,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 231 - 241, The Gen_File_B_Map constructor overload (explicit Gen_File_B_Map(std::string file_path, const bool auto_sync = true, const Compare & cmp = Compare())) is missing the `@note` O(page_count) complexity annotation; update its Doxygen comment block to include a line like "@note Complexity: O(page_count)" (matching the style used for the other constructors) so the documentation is consistent with the other Gen_File_B_Map constructors and make sure the tag appears alongside the `@throws` and `@param` tags above the member initializer that invokes tree_.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tpl_b_tree.H`:
- Around line 485-491: verify() currently allows a non-leaf root with zero keys;
update the root validation to also enforce a lower bound when the root is not a
leaf. In the verify() function replace the current root branch (the else that
only checks node->keys.size() > MaxKeys) with a check that if node ==
root_.get() and not node->is_leaf() (or equivalent leaf test) then ensure
node->keys.size() >= 1 and <= MaxKeys (otherwise return false); continue to use
MinKeys/MaxKeys for non-root nodes as before and reference node->keys,
root_.get(), MinKeys, MaxKeys and the node->is_leaf() predicate to locate where
to change the logic.
In `@tpl_file_b_map.H`:
- Around line 553-557: In the catch(...) block, preserve and rethrow the
original exception while defensively handling potential failures from
tree_.insert(*old_record): capture the current exception with std::exception_ptr
ep = std::current_exception(); then perform the rollback inside a nested try {
if (!tree_.insert(*old_record)) { /* optional log */ } } catch(...) { /*
optional log swallow */ } finally rethrow the preserved original with
std::rethrow_exception(ep); this ensures tree_.insert and old_record are
referenced and any nested failure is swallowed or logged but the original
exception is propagated.
---
Duplicate comments:
In `@scripts/check_examples_run`:
- Around line 25-81: Add a compact integration smoke test that invokes the CLI
(using OptionParser.new) with combinations of flags and asserts the script
honors options[:only], options[:exclude], options[:timeout], options[:log_dir],
and options[:verbose]; specifically, run the script with a small set of example
binaries, verify that options[:only] runs only selected examples and
options[:exclude] skips them, that a too-short options[:timeout] causes a
timeout reporting path to be exercised, that logs are written into the given
options[:log_dir], and that verbose mode streams output instead of only saving
logs. Implement the test as a small executable test script (or RSpec/mini-test
case) that spawns the script, captures exit status and stdout/stderr, checks
presence/absence of example run markers and log files, and cleans up artifacts
to prevent flakiness.
- Around line 150-170: The script currently forces a single-config CMake flow by
setting CMAKE_BUILD_TYPE in configure_cmd and by calling cmake --build without a
--config, and later only checks single-config output paths (affecting
configure_cmd, build_cmd and the executable lookup around the examples_build_dir
checks). Update configure logic to set CMAKE_BUILD_TYPE only for single-config
generators (or omit it when a multi-config generator is detected via
CMAKE_GENERATOR or user options), add --config #{options[:build_type]} to
build_cmd when options[:build_type] is provided (or when using a multi-config
generator), and modify the executable discovery code (the checks around
examples_build_dir lines ~186-193) to look for artifacts in both single-config
locations and multi-config subdirectories (e.g., Debug/Release) or use the
--config-built output path; ensure run_command usage and logging
(run_command.call and File.write to configure.log) remain consistent.
---
Nitpick comments:
In `@tpl_file_b_map.H`:
- Around line 195-203: The reserve call in to_pairs currently uses a ternary to
reserve 1 when records is empty; change it to reserve(records.size()) to allow
reserving 0 (valid and more idiomatic). Update the same pattern in the keys()
and values() functions to call reserve(records.size()) instead of the ternary
expression so all three functions (to_pairs, keys, values) consistently reserve
exactly records.size().
- Around line 231-241: The Gen_File_B_Map constructor overload (explicit
Gen_File_B_Map(std::string file_path, const bool auto_sync = true, const Compare
& cmp = Compare())) is missing the `@note` O(page_count) complexity annotation;
update its Doxygen comment block to include a line like "@note Complexity:
O(page_count)" (matching the style used for the other constructors) so the
documentation is consistent with the other Gen_File_B_Map constructors and make
sure the tag appears alongside the `@throws` and `@param` tags above the member
initializer that invokes tree_.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7ebdf32f-7e8c-4c2d-8344-0daf299fcd12
📒 Files selected for processing (3)
scripts/check_examples_runtpl_b_tree.Htpl_file_b_map.H
The hash_restriction_test.cc was updated to replace <vector> with <string> and <utility> includes. The core logic of the test remains unchanged, but the headers are adjusted for correctness. This also includes some minor reformatting of the tpl_b_tree.H, bloom-filter.H, compact-cuckoo-filter.H, and cuckoo-filter.H files by fixing some docstrings and formatting. No functional changes.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (8)
bloom-filter.H (3)
454-455: Simplify unsigned comparison.Since
mandkare derived fromsize_t(unsigned),m <= 0andk <= 0can only be true when they equal zero. The< 0part is dead code.Suggested change
- if (m <= 0 or k <= 0) + if (m == 0 or k == 0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bloom-filter.H` around lines 454 - 455, The condition "m <= 0 or k <= 0" uses signed comparisons on unsigned values and the "< 0" branch is unreachable; update the check that guards against zero-sized inputs (variables m and k) to test for equality to zero (e.g., m == 0 || k == 0 or use !m || !k) so the intent is clear and the dead code is removed; locate the if block that returns 0 when m or k are non-positive and replace the condition accordingly.
170-171: Consider using Aleph error macros for precondition validation.The guidelines recommend using
ah_invalid_argument_iffor input validation instead ofassert, which only fires in debug builds. Sincen == 0or invalidpwould cause division issues or nonsensical results, runtime validation may be more appropriate.Suggested change
- assert(n > 0); - assert(p > 0 and p < 1); + ah_invalid_argument_if(n == 0) << "Expected number of items must be > 0"; + ah_invalid_argument_if(p <= 0 or p >= 1) << "False positive probability must be in (0, 1)";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bloom-filter.H` around lines 170 - 171, Replace the debug-only asserts for input validation with Aleph's runtime error macro: change "assert(n > 0);" and "assert(p > 0 and p < 1);" to calls like ah_invalid_argument_if(n <= 0, "n must be > 0") and ah_invalid_argument_if(!(p > 0 && p < 1), "p must be in (0,1)"); keep the same variable names (n, p) so the check is local to the constructor/function and add the appropriate aleph error header if not already included.
273-282: Avoid callingestimate()twice in delegating constructor.
estimate(n, p)is computed twice to extractmandkseparately. Per coding guidelines favoring structured bindings, consider computing once.Suggested refactor using a static helper
+ private: + static Bloom_Filter create_from_estimate(const size_t n, const double p, + unsigned long seed, Hash_Fct hash_fn) + { + const auto [m, k] = estimate(n, p); + return Bloom_Filter(m, k, hash_fn, seed); + } + + public: Bloom_Filter(const size_t n, const double p, unsigned long seed = static_cast<unsigned long>(std::time(nullptr)), Hash_Fct hash_fn = dft_seed_hash_ptr_fct<T>) - : Bloom_Filter(std::get<0>(estimate(n, p)), - std::get<1>(estimate(n, p)), - hash_fn, - seed) - { - // empty - } + : Bloom_Filter(create_from_estimate(n, p, seed, hash_fn)) + { }Alternatively, use an immediately invoked lambda:
Bloom_Filter(const size_t n, const double p, unsigned long seed = static_cast<unsigned long>(std::time(nullptr)), Hash_Fct hash_fn = dft_seed_hash_ptr_fct<T>) : Bloom_Filter([&] { const auto [m, k] = estimate(n, p); return std::make_tuple(m, k, hash_fn, seed); }())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bloom-filter.H` around lines 273 - 282, The delegating Bloom_Filter constructor calls estimate(n, p) twice to extract m and k; compute estimate once and forward both values to the delegating constructor. Modify the Bloom_Filter(const size_t n, const double p, unsigned long seed, Hash_Fct hash_fn) constructor to call estimate(n, p) a single time (e.g., auto [m,k] = estimate(n,p)) and then delegate to Bloom_Filter(m, k, hash_fn, seed) (or use an immediately-invoked lambda returning a tuple of m,k,hash_fn,seed), referencing the Bloom_Filter constructor and the estimate function to locate the code.tpl_file_b_map.H (2)
643-666: Consider using range-based for loops in materialization methods.Similar to
to_pairs, these loops could benefit from range-based for syntax.♻️ Optional refactor
[[nodiscard]] Array<Key> keys() const { Array<Key> out; const auto records = tree_.keys(); out.reserve(records.size()); - for (size_t i = 0; i < records.size(); ++i) - out.append(records[i].get_key()); + for (const auto & rec : records) + out.append(rec.get_key()); return out; } [[nodiscard]] Array<Value> values() const { Array<Value> out; const auto records = tree_.keys(); out.reserve(records.size()); - for (size_t i = 0; i < records.size(); ++i) - out.append(records[i].get_value()); + for (const auto & rec : records) + out.append(rec.get_value()); return out; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 643 - 666, The keys() and values() methods use index-based loops over the local variable records (from tree_.keys()); replace those for (size_t i = 0; i < records.size(); ++i) loops with range-based for loops (e.g., for (const auto& rec : records)) and use rec.get_key() / rec.get_value() when appending to out, mirroring the style used in to_pairs() to simplify iteration and avoid repeated size()/operator[] access.
195-203: Consider using range-based for loop for better C++20 style.The index-based loop works correctly, but a range-based for loop would be more idiomatic and potentially clearer.
♻️ Optional refactor
[[nodiscard]] static Array<std::pair<Key, Value>> to_pairs(const Array<Record> & records) { Array<std::pair<Key, Value>> out; out.reserve(records.size()); - for (size_t i = 0; i < records.size(); ++i) - out.append(to_pair(records[i])); + for (const auto & rec : records) + out.append(to_pair(rec)); return out; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_file_b_map.H` around lines 195 - 203, The to_pairs function uses an index-based loop; change it to a range-based for loop to be more idiomatic C++20: iterate directly over records (const Record& rec) and call to_pair(rec) for each element, appending to out; keep the same return type Array<std::pair<Key, Value>> and preserve out.reserve(records.size()) and out.append(...) behavior to retain performance and semantics.tpl_b_tree.H (2)
147-161: Consider binary search for larger node degrees.The
lower_bound_indexandupper_bound_indexuse linear scans which are O(k) per node. With the defaultMinDegree = 16, nodes can hold up to 31 keys. While linear search benefits from cache locality at small sizes, binary search (O(log k)) may be preferable if users configure larger degrees.This is a low-priority optimization since the current approach is correct and performs well for typical configurations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 147 - 161, The current linear scans in lower_bound_index and upper_bound_index (which iterate over Array<Key> keys using cmp_ and equals) are O(k) per node and should be replaced with binary search to improve performance for larger node degrees; implement binary-search loops (or use std::lower_bound/upper_bound with a comparator that calls cmp_ and equals) to locate the insertion/upper positions in lower_bound_index and upper_bound_index respectively while preserving the same ordering semantics and return type (size_t).
742-746: Move-insert doesn't actually move the key into the tree.
insert(Key&&)moves the key into a local copy, then passes that copy by const-ref toinsert(const Key&), which copies it again. The original move semantics are lost.💡 Suggested fix
Either make
insert_nonfullacceptKey&&and propagate moves, or consolidate both overloads into a single template:template <typename K> requires std::convertible_to<K, Key> bool insert(K&& key) { if (contains(key)) return false; // ... rest with std::forward<K>(key) propagated through }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 742 - 746, The move overload insert(Key&&) currently moves into a local Key copy then calls insert(const Key&), losing move semantics; fix by propagating the rvalue into the tree: either change the internal insertion path (e.g., make insert_nonfull accept Key&& and propagate moves through node insertion functions) or replace both overloads with a single forwarding template insert<K>(K&& key) that checks contains(key) then forwards std::forward<K>(key) into the existing insertion logic so the original rvalue is moved into the tree rather than copied.cuckoo-filter.H (1)
222-224: Nice to have: centralizekTargetLoadas a class-level constant shared with the compact variant.This reduces policy drift between
Cuckoo_FilterandCompact_Cuckoo_Filterand avoids repeating a tuning constant in constructors.As per coding guidelines: “Avoid magic numbers; use named constants or enums.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 222 - 224, Centralize the magic constant kTargetLoad by moving it from the local constructor scope into a shared class-level constant (e.g., a static constexpr double) so both Cuckoo_Filter and Compact_Cuckoo_Filter use the same value; update the calculation that computes needed_buckets (which references EntriesPerBucket and capacity) to use the new Cuckoo_Filter::kTargetLoad (or a common parent/namespace constant) and remove the local constexpr, ensuring both classes reference the single symbol to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compact-cuckoo-filter.H`:
- Around line 230-423: Add the missing Doxygen `@throws` tags (explicitly mention
"none" when no exceptions are thrown) and a clear thread-safety guarantee to the
public API comments: update the class-level comment for Compact_Cuckoo_Filter to
state the thread-safety model (e.g., "not thread-safe" or "concurrent reads
safe, concurrent writes not safe") and add an `@throws` entry to every public
member comment (constructor Compact_Cuckoo_Filter, size, empty, is_empty,
capacity, load_factor, memory_usage, insert, append, contains, remove, clear) —
keep the existing `@brief/`@param/@return tags and append an `@throws` line for
exceptions or "none" plus a short thread-safety note where relevant in method
docs.
- Around line 208-214: get_fingerprint currently calls murmur3hash which
bypasses the Aleph hash policy; replace that call with the seeded dft_hash_fct
to respect ADL/aleph_hash_value and byte-hash fallbacks. Specifically, in
get_fingerprint(const T &item) use auto fp =
static_cast<Fingerprint>(dft_hash_fct<T>(item, 0x12345678) & fp_mask); keep the
existing non-zero normalization (if fp==0 set fp=1) and return fp; apply the
identical change to the get_fingerprint implementation in cuckoo-filter.H (the
one at ~line 187).
---
Nitpick comments:
In `@bloom-filter.H`:
- Around line 454-455: The condition "m <= 0 or k <= 0" uses signed comparisons
on unsigned values and the "< 0" branch is unreachable; update the check that
guards against zero-sized inputs (variables m and k) to test for equality to
zero (e.g., m == 0 || k == 0 or use !m || !k) so the intent is clear and the
dead code is removed; locate the if block that returns 0 when m or k are
non-positive and replace the condition accordingly.
- Around line 170-171: Replace the debug-only asserts for input validation with
Aleph's runtime error macro: change "assert(n > 0);" and "assert(p > 0 and p <
1);" to calls like ah_invalid_argument_if(n <= 0, "n must be > 0") and
ah_invalid_argument_if(!(p > 0 && p < 1), "p must be in (0,1)"); keep the same
variable names (n, p) so the check is local to the constructor/function and add
the appropriate aleph error header if not already included.
- Around line 273-282: The delegating Bloom_Filter constructor calls estimate(n,
p) twice to extract m and k; compute estimate once and forward both values to
the delegating constructor. Modify the Bloom_Filter(const size_t n, const double
p, unsigned long seed, Hash_Fct hash_fn) constructor to call estimate(n, p) a
single time (e.g., auto [m,k] = estimate(n,p)) and then delegate to
Bloom_Filter(m, k, hash_fn, seed) (or use an immediately-invoked lambda
returning a tuple of m,k,hash_fn,seed), referencing the Bloom_Filter constructor
and the estimate function to locate the code.
In `@cuckoo-filter.H`:
- Around line 222-224: Centralize the magic constant kTargetLoad by moving it
from the local constructor scope into a shared class-level constant (e.g., a
static constexpr double) so both Cuckoo_Filter and Compact_Cuckoo_Filter use the
same value; update the calculation that computes needed_buckets (which
references EntriesPerBucket and capacity) to use the new
Cuckoo_Filter::kTargetLoad (or a common parent/namespace constant) and remove
the local constexpr, ensuring both classes reference the single symbol to avoid
drift.
In `@tpl_b_tree.H`:
- Around line 147-161: The current linear scans in lower_bound_index and
upper_bound_index (which iterate over Array<Key> keys using cmp_ and equals) are
O(k) per node and should be replaced with binary search to improve performance
for larger node degrees; implement binary-search loops (or use
std::lower_bound/upper_bound with a comparator that calls cmp_ and equals) to
locate the insertion/upper positions in lower_bound_index and upper_bound_index
respectively while preserving the same ordering semantics and return type
(size_t).
- Around line 742-746: The move overload insert(Key&&) currently moves into a
local Key copy then calls insert(const Key&), losing move semantics; fix by
propagating the rvalue into the tree: either change the internal insertion path
(e.g., make insert_nonfull accept Key&& and propagate moves through node
insertion functions) or replace both overloads with a single forwarding template
insert<K>(K&& key) that checks contains(key) then forwards std::forward<K>(key)
into the existing insertion logic so the original rvalue is moved into the tree
rather than copied.
In `@tpl_file_b_map.H`:
- Around line 643-666: The keys() and values() methods use index-based loops
over the local variable records (from tree_.keys()); replace those for (size_t i
= 0; i < records.size(); ++i) loops with range-based for loops (e.g., for (const
auto& rec : records)) and use rec.get_key() / rec.get_value() when appending to
out, mirroring the style used in to_pairs() to simplify iteration and avoid
repeated size()/operator[] access.
- Around line 195-203: The to_pairs function uses an index-based loop; change it
to a range-based for loop to be more idiomatic C++20: iterate directly over
records (const Record& rec) and call to_pair(rec) for each element, appending to
out; keep the same return type Array<std::pair<Key, Value>> and preserve
out.reserve(records.size()) and out.append(...) behavior to retain performance
and semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 17bcb7e2-82fa-41f7-85b4-50b7b26221dd
📒 Files selected for processing (6)
Tests/hash_restriction_test.ccbloom-filter.Hcompact-cuckoo-filter.Hcuckoo-filter.Htpl_b_tree.Htpl_file_b_map.H
🚧 Files skipped from review as they are similar to previous changes (1)
- Tests/hash_restriction_test.cc
Refactor Bloom filter includes and constructors for clarity. Improve Cuckoo filter constructor comments and add `kTargetLoad` constant. Optimize B-Tree lower/upper bound index calculations. Modernize File B-Map iteration with range-based for loops.
There was a problem hiding this comment.
Pull request overview
This PR adds multiway search tree containers (B-Tree/B+Tree, including file-backed variants) and new probabilistic filters (Cuckoo and Quotient families), plus documentation, examples, benchmarks, and expanded tests.
Changes:
- Introduces new ordered containers (in-memory + persistent B-Tree/B+Tree sets/maps) and probabilistic filters (Cuckoo + Quotient, plus compact variants).
- Adds extensive test coverage for new data structures and hashing behavior; updates CI to exclude an expensive statistical test.
- Expands docs/README and adds example/benchmark tooling (including an examples runner script and improved CLI help text).
Reviewed changes
Copilot reviewed 62 out of 72 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tpl_dynSetHash.H | Switches default hash function pointer helper to the new *_ptr_* variant. |
| tpl_dynMapOhash.H | Updates default open-hash function pointers to *_ptr_* helpers. |
| tpl_dynLhash.H | Aligns default hash function pointer helper with the new *_ptr_* helper. |
| scripts/check_examples_run | Adds a Ruby helper to configure/build and run all Examples with per-example logs/timeouts. |
| net_utils.H | Strengthens layered network generation to reject empty interior layers and ensure intermediate nodes get in-degree. |
| docs/b_tree_family.md | Adds a new guide describing B-Tree/B+Tree and file-backed variants and their APIs. |
| cuckoo-filter.H | Adds a full Cuckoo filter implementation with stash support. |
| compact-quotient-filter.H | Adds a bit-packed quotient filter implementation to reduce memory use. |
| compact-cuckoo-filter.H | Adds a bit-packed cuckoo filter variant. |
| bloom-filter.H | Modernizes Bloom filter API (nodiscard, hash fn defaults) and adjusts intersection sizing behavior. |
| al-matrix.H | Updates pair-hash function pointers to *_ptr_* helpers. |
| Tests/quotient-filter.cc | Adds a comprehensive Quotient_Filter test suite. |
| Tests/net_utils_test.cc | Adds a regression test for rejecting empty interior layers. |
| Tests/jsw_init_test.cc | Adds tests for JSW hash lazy init and reinit behavior. |
| Tests/hash_restriction_test.cc | Adds compile-time hashing restriction tests. |
| Tests/hash_fct_test.cc | Expands hash consistency/quality tests and updates JSW init usage. |
| Tests/hash-it.cc | Fixes test to use a callable consistent with expected hash functor types. |
| Tests/file_bplus_map_test.cc | Adds tests for file-backed B+Tree map behaviors (range, iterators, reopen, codecs). |
| Tests/file_b_map_test.cc | Adds tests for file-backed B-Tree map behaviors (find/assign/bounds/reopen, codecs). |
| Tests/cuckoo-filter.cc | Adds a Cuckoo_Filter test suite with FP-budgeted assertions. |
| Tests/compact-cuckoo-filter.cc | Adds Compact_Cuckoo_Filter tests including behavior, statistics, and comparisons. |
| Tests/bplus_tree_test.cc | Adds BPlus_Tree correctness and randomized oracle tests. |
| Tests/b_tree_test.cc | Adds B_Tree correctness and randomized oracle tests. |
| Tests/CMakeLists.txt | Registers new tests in the CMake test program list. |
| README.md | Documents new multiway trees and probabilistic filters; updates stats/tables. |
| README.es.md | Updates Spanish README probabilistic structures table/section. |
| Examples/timeAllTree.C | Adds “no-args” help text for usability. |
| Examples/testQueue.C | Adds “no-args” help text and tightens CLI argument validation. |
| Examples/testAllTree.C | Adds “no-args” help text for usability. |
| Examples/test-rvalues.C | Changes graph node verification logic to detect duplicates/missing ids. |
| Examples/test-odhash.cc | Fixes hashing of pair keys and suppresses unused insert return values. |
| Examples/test-con-queue.C | Adds “no-args” help text for usability. |
| Examples/test-comb.C | Fixes test data length to match expected combination behavior. |
| Examples/set_structures_benchmark.cc | Changes default benchmark size from 1M to 2M keys. |
| Examples/quotient_filter_example.C | Adds a large, sectioned example program for Quotient_Filter. |
| Examples/ntreepic.C | Adds “no-args” usage output for the visualizer CLI. |
| Examples/hungarian_example.cc | Ensures matrix allocation before filling data. |
| Examples/graphpic.C | Adds “no-args” usage output for the visualizer CLI. |
| Examples/file_bplus_tree_example.cc | Adds a persistent B+Tree example (create/update/sync/reopen + iterator scan). |
| Examples/file_bplus_map_example.cc | Adds a persistent B+Tree map example (range + iterator scan + read-only reopen). |
| Examples/file_b_tree_example.cc | Adds a persistent B-Tree example (mutations + sync/reopen). |
| Examples/file_b_map_example.cc | Adds a persistent B-Tree map example (mutations + read-only reopen). |
| Examples/cuckoo_filter_example.cc | Adds a small example demonstrating Cuckoo_Filter usage. |
| Examples/compact_quotient_filter_example.C | Adds a comprehensive example program for Compact_Quotient_Filter. |
| Examples/compact_cuckoo_filter_example.C | Adds a comprehensive example program for Compact_Cuckoo_Filter. |
| Examples/btreepic.C | Adds “no-args” usage output for the visualizer CLI. |
| Examples/bplus_tree_example.cc | Adds an in-memory B+Tree example. |
| Examples/b_tree_example.cc | Adds an in-memory B-Tree example. |
| CMakeLists.txt | Exports new headers in the public header list. |
| .github/workflows/ci.yml | Excludes an expensive/flaky statistical hash test from CI/coverage/asan/tsan runs. |
| .clang-format | Replaces generated config with a project-specific C++20 style config. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
tpl_b_tree.H (1)
554-896:⚠️ Potential issue | 🟡 MinorShould fix: Public method docs still miss per-method thread-safety notes.
There is a class-level thread-safety paragraph, but method-level contracts still omit explicit thread-safety statements required by repo policy.
As per coding guidelines, "Document thread-safety guarantees in class/function comments".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 554 - 896, Add explicit per-method thread-safety sentences to each public API doc comment (e.g., the constructors Gen_B_Tree(...), copy/move/assignment operators, key_comp(), get_compare(), empty(), is_empty(), size(), height(), clear(), contains(), search(), insert(const Key&), insert(Key&&), remove(), min_key(), max_key(), lower_bound(), upper_bound(), keys(), verify()) so they comply with repo policy: state whether the method is safe for concurrent reads (immutable/const methods like key_comp, get_compare, empty, size, height, contains, search, min_key, max_key, lower_bound, upper_bound, keys, verify are thread-safe for concurrent reads), and mark mutating/non-const methods (insert, remove, clear, move/copy assignment) as not thread-safe for concurrent mutations and require external synchronization; mirror the class-level thread-safety paragraph wording and keep the phrasing concise and consistent across all method docs.
🧹 Nitpick comments (6)
cuckoo-filter.H (4)
260-387: Add@throwsdocumentation to mutation and query methods.For consistency with
Compact_Cuckoo_Filter, add@throws none.toinsert(),append(),contains(),remove(), andclear()methods.Example for
insert():/** * `@brief` Insert an item into the filter. * `@param` item Item to insert. * `@return` `true` if insertion succeeded, `false` if filter is full. + * `@throws` none. + * `@note` Amortised O(1). */ bool insert(const T &item)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 260 - 387, Add a "@throws none." tag to the Doxygen header of the mutation and query methods for consistency: update the comments for insert(const T &item), append(const T &item), contains(const T &item) const, remove(const T &item), and clear() to include "@throws none." Ensure each method's brief block includes that line exactly to match Compact_Cuckoo_Filter's style.
208-228: Add@throwstag to constructor documentation.The constructor can throw
std::bad_allocon memory allocation failure. TheCompact_Cuckoo_Filterconstructor documents this (line 242), but this constructor doesn't./** * `@brief` Construct a Cuckoo Filter with a given capacity. * `@param` capacity Minimum number of items to support. * `@param` seed Optional RNG seed for deterministic kick sequences. * Pass a fixed value in tests to eliminate flakiness. * Defaults to a non-deterministic device seed. + * `@throws` std::bad_alloc if memory allocation fails. + * `@note` O(capacity / EntriesPerBucket). */ explicit Cuckoo_Filter(size_t capacity,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 208 - 228, Update the Cuckoo_Filter constructor documentation comment to include an `@throws` tag for std::bad_alloc, noting that the constructor (explicit Cuckoo_Filter(size_t capacity, std::optional<std::uint32_t> seed)) may throw std::bad_alloc on allocation failures (e.g., when buckets.putn(num_buckets) or any internal allocation occurs); mirror the style used in Compact_Cuckoo_Filter's constructor docs for consistency.
230-258: Add@throwstags to public method documentation for consistency.The
Compact_Cuckoo_Filterversion includes@throws none.on all accessor methods, but this file omits them. For API documentation consistency:- /** `@brief` Return the number of items currently in the filter. */ + /** `@brief` Return the number of items currently in the filter. + * `@return` Current item count. + * `@throws` none. + */ [[nodiscard]] size_t size() const noexceptApply similar updates to
empty(),is_empty(),capacity(), andload_factor().As per coding guidelines: "All public member functions must have
@brief,@param,@return, and@throwsDoxygen tags."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 230 - 258, Add missing Doxygen `@throws` tags to the public accessors to match the Compact_Cuckoo_Filter style: update the docblocks for size(), empty(), is_empty(), capacity(), and load_factor() to include "@throws none." (or the exact phrasing used elsewhere) in addition to their existing `@brief/`@return tags so all public member functions follow the project's Doxygen convention.
92-93: Add thread-safety documentation to matchCompact_Cuckoo_Filter.The
Compact_Cuckoo_Filterheader includes a thread-safety note at the class level (lines 90-92), but this class lacks the same documentation. For API consistency and per coding guidelines, add the thread-safety guarantee./** `@brief` Industrial-grade Cuckoo Filter implementation. */ +/** + * `@note` Not thread-safe. Concurrent reads (contains / size / etc.) + * are safe; any concurrent write (insert / remove / clear) requires + * external synchronisation. + */ template <typename T, size_t FingerprintBits = 8, size_t EntriesPerBucket = 4> class Cuckoo_FilterAs per coding guidelines: "Document thread-safety guarantees in class/function comments."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 92 - 93, Add the same class-level thread-safety note to the Cuckoo_Filter template declaration as exists on Compact_Cuckoo_Filter: update the comment block immediately above the Cuckoo_Filter class template (template<typename T, size_t FingerprintBits = 8, size_t EntriesPerBucket = 4> class Cuckoo_Filter) to include the documented thread-safety guarantee (copy the wording from Compact_Cuckoo_Filter) so the API and coding-guideline documentation are consistent.compact-cuckoo-filter.H (1)
345-381: Consider the stack allocation size for kick history.The
history[max_kicks]array allocates ~8KB on the stack (500 × ~16 bytes). While this is within typical stack limits, it's worth noting for deeply recursive call chains or constrained environments.The rollback logic correctly restores displaced fingerprints when insertion fails after exhausting both the kick chain and stash.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compact-cuckoo-filter.H` around lines 345 - 381, The Kick_Record history is allocated on the stack as history[max_kicks], which can be large; change it to a heap-allocated container to avoid large stack usage by replacing the fixed array with a dynamic container (e.g., std::vector<Kick_Record> history; history.reserve(max_kicks) or a std::unique_ptr<Kick_Record[]>), keep using history_size and the same push/restore logic, and ensure the vector/heap allocation is created before the kick loop and used in place of history[...] so rollback via buckets(rec.bucket_idx).restore(rec.original_fp, rec.slot_idx) still works correctly.bloom-filter.H (1)
144-144: Consider usingDynListinstead ofstd::vectorforseeds.Per coding guidelines,
DynList<T>is preferred overstd::vector<T>. Theseedsmember is only iterated sequentially and compared for equality—neither requires contiguous memory or random access. If there's a specific reason to usestd::vectorhere (e.g., interfacing with external APIs), a brief comment explaining the choice would be helpful.♻️ Suggested refactor
- std::vector<unsigned long> seeds; + DynList<unsigned long> seeds;Note: This would also require updating the initialization at line 253 from
seeds(num_hashes)to using a loop or helper, and updating the equality comparison at line 150.As per coding guidelines: "Use
DynList<T>instead ofstd::list<T>orstd::vector<T>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bloom-filter.H` at line 144, Replace the member declaration std::vector<unsigned long> seeds with DynList<unsigned long> seeds and update all usages: change the constructor/init currently doing seeds(num_hashes) to fill seeds with a loop or helper that pushes num_hashes values (e.g., repeated push_back) and update the equality check that compares seeds (the operator== or equality logic) to perform sequential iteration/element-wise comparison compatible with DynList rather than relying on vector-specific semantics; if std::vector was chosen intentionally for an external API, add a short comment explaining that rationale instead of changing the type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tpl_b_tree.H`:
- Around line 526-534: verify_node() returns early when node->leaf is true, so a
leaf with non-empty children can bypass the children/keys consistency check;
update verify_node() (the function performing the node checks) to first validate
that if node->leaf is true then node->children must be empty (or conversely, if
node->children is non-empty then node->leaf must be false) before returning, and
only then set/compare leaf_depth and depth; ensure the existing children.size()
vs keys.size()+1 check still runs for internal nodes (using node->children,
node->keys, node->leaf, leaf_depth and depth to locate logic).
In `@tpl_file_b_map.H`:
- Around line 43-52: The header list in tpl_file_b_map.H is missing <exception>,
causing use of std::current_exception and std::rethrow_exception to be
ill-formed; add `#include` <exception> alongside the other includes at the top of
the file so calls to std::current_exception and std::rethrow_exception compile
correctly.
- Around line 267-285: Add an explicit null check for the C-string API boundary
by creating a small inline helper (e.g., validate_file_path(const char* p)) that
calls ah_invalid_argument_if(p == nullptr, "Gen_File_B_Map: file_path must not
be null") and returns p, then use validate_file_path(file_path) in the
initializer lists that currently call tree_(file_path, ...) and tree_(file_path,
open_mode, ...) so the check runs before forwarding to Gen_File_B_Tree; update
the two Gen_File_B_Map constructors (the ones taking const char* and
initializing tree_) to call the helper instead of using file_path directly.
---
Duplicate comments:
In `@tpl_b_tree.H`:
- Around line 554-896: Add explicit per-method thread-safety sentences to each
public API doc comment (e.g., the constructors Gen_B_Tree(...),
copy/move/assignment operators, key_comp(), get_compare(), empty(), is_empty(),
size(), height(), clear(), contains(), search(), insert(const Key&),
insert(Key&&), remove(), min_key(), max_key(), lower_bound(), upper_bound(),
keys(), verify()) so they comply with repo policy: state whether the method is
safe for concurrent reads (immutable/const methods like key_comp, get_compare,
empty, size, height, contains, search, min_key, max_key, lower_bound,
upper_bound, keys, verify are thread-safe for concurrent reads), and mark
mutating/non-const methods (insert, remove, clear, move/copy assignment) as not
thread-safe for concurrent mutations and require external synchronization;
mirror the class-level thread-safety paragraph wording and keep the phrasing
concise and consistent across all method docs.
---
Nitpick comments:
In `@bloom-filter.H`:
- Line 144: Replace the member declaration std::vector<unsigned long> seeds with
DynList<unsigned long> seeds and update all usages: change the constructor/init
currently doing seeds(num_hashes) to fill seeds with a loop or helper that
pushes num_hashes values (e.g., repeated push_back) and update the equality
check that compares seeds (the operator== or equality logic) to perform
sequential iteration/element-wise comparison compatible with DynList rather than
relying on vector-specific semantics; if std::vector was chosen intentionally
for an external API, add a short comment explaining that rationale instead of
changing the type.
In `@compact-cuckoo-filter.H`:
- Around line 345-381: The Kick_Record history is allocated on the stack as
history[max_kicks], which can be large; change it to a heap-allocated container
to avoid large stack usage by replacing the fixed array with a dynamic container
(e.g., std::vector<Kick_Record> history; history.reserve(max_kicks) or a
std::unique_ptr<Kick_Record[]>), keep using history_size and the same
push/restore logic, and ensure the vector/heap allocation is created before the
kick loop and used in place of history[...] so rollback via
buckets(rec.bucket_idx).restore(rec.original_fp, rec.slot_idx) still works
correctly.
In `@cuckoo-filter.H`:
- Around line 260-387: Add a "@throws none." tag to the Doxygen header of the
mutation and query methods for consistency: update the comments for insert(const
T &item), append(const T &item), contains(const T &item) const, remove(const T
&item), and clear() to include "@throws none." Ensure each method's brief block
includes that line exactly to match Compact_Cuckoo_Filter's style.
- Around line 208-228: Update the Cuckoo_Filter constructor documentation
comment to include an `@throws` tag for std::bad_alloc, noting that the
constructor (explicit Cuckoo_Filter(size_t capacity,
std::optional<std::uint32_t> seed)) may throw std::bad_alloc on allocation
failures (e.g., when buckets.putn(num_buckets) or any internal allocation
occurs); mirror the style used in Compact_Cuckoo_Filter's constructor docs for
consistency.
- Around line 230-258: Add missing Doxygen `@throws` tags to the public accessors
to match the Compact_Cuckoo_Filter style: update the docblocks for size(),
empty(), is_empty(), capacity(), and load_factor() to include "@throws none."
(or the exact phrasing used elsewhere) in addition to their existing
`@brief/`@return tags so all public member functions follow the project's Doxygen
convention.
- Around line 92-93: Add the same class-level thread-safety note to the
Cuckoo_Filter template declaration as exists on Compact_Cuckoo_Filter: update
the comment block immediately above the Cuckoo_Filter class template
(template<typename T, size_t FingerprintBits = 8, size_t EntriesPerBucket = 4>
class Cuckoo_Filter) to include the documented thread-safety guarantee (copy the
wording from Compact_Cuckoo_Filter) so the API and coding-guideline
documentation are consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 88a4b474-9ecb-4d6d-910e-7f2190bca16f
📒 Files selected for processing (5)
bloom-filter.Hcompact-cuckoo-filter.Hcuckoo-filter.Htpl_b_tree.Htpl_file_b_map.H
Add detailed docstrings to Cuckoo Filter, improve B-Tree verification logic, and include error handling in File B-Map constructors. These changes enhance code clarity, robustness, and maintainability.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tpl_b_tree.H (1)
756-760: Move overload doesn't preserve move semantics into the tree.
insert(Key&&)moves the key into a local copy, then callsinsert(const Key&), which copies into the tree. The move is effectively wasted.Consider accepting the key by value in the const-ref overload and moving internally, or implement a proper forwarding path in
insert_nonfull.💡 Alternative: accept by value
- bool insert(const Key &key) + bool insert(Key key) { - if (contains(key)) + if (contains(key)) // key not modified, comparison is fine return false; // ... rest uses std::move(key) when storing } - - bool insert(Key &&key) - { - Key copy = std::move(key); - return insert(copy); - }This would require updating
insert_nonfullto forward the key appropriately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tpl_b_tree.H` around lines 756 - 760, The move overload insert(Key&&) currently moves into a local copy and calls insert(const Key&), losing move semantics; fix by either (A) changing the const-ref overload to accept Key by value (insert(Key key)) and then move key into the tree/internal helper, and update insert_nonfull to accept/forward that Key (e.g., take Key&& or Key by value and std::move into node storage), or (B) implement a proper forwarding path: have insert(Key&&) call the internal insert_nonfull that accepts Key&& (forwarding the original key with std::move) instead of creating a local copy; update signatures of insert_nonfull and any helpers to accept/move the Key so the move is preserved. Ensure changes reference the existing symbols insert(Key&&), insert(const Key&)/insert(Key), and insert_nonfull so callers forward/move appropriately.cuckoo-filter.H (2)
149-153: Remove unusedBucket::swap_random()to avoid divergence in kick logic.Line 149 introduces
swap_random(), but insertion usesswap_random_slot()(Line 322) for rollback bookkeeping. Keeping both swap paths increases maintenance risk without clear benefit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 149 - 153, Remove the unused Bucket::swap_random(Fingerprint &fp, std::mt19937 &rng) implementation to avoid divergence from the existing kick logic and rollback bookkeeping; ensure all insertion/kick code continues to call the canonical Bucket::swap_random_slot(...) (used at insertion line referencing swap_random_slot) and that there are no remaining references to swap_random or duplicated swap behavior, keeping the swap logic centralized in swap_random_slot and deleting the swap_random definition and its declaration.
227-230: Harden extreme-capacity sizing against float-to-integer edge cases.The
std::ceil(...double...)tosize_tcast works for normal sizes, but a guard for very largecapacityvalues would make this constructor more robust and fail-fast (using Aleph error macros).As per coding guidelines "Validate external inputs before processing" and "Fail fast with clear error messages".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cuckoo-filter.H`:
- Around line 349-355: The Doxygen for the public method append(const T &item)
and clear() is incomplete; update both comment blocks to include the required
tags: add `@param` for append's item parameter, add `@return` describing the boolean
result for append, and ensure both have `@brief` and `@throws` entries (use "@throws
none" if they do not throw). Locate the comment above append(const T &item) and
the comment above clear() and expand them to include `@brief`, `@param` (where
applicable), `@return` (for append), and `@throws` to satisfy the project's Doxygen
contract.
In `@tpl_file_b_map.H`:
- Around line 551-553: The rollback snapshot retrieval uses
tree_.lower_bound(probe) to set old_record but only checks has_value(); update
File_B_Map::insert_or_assign to also verify that old_record->first (the found
key) equals the target key before using it for rollback—if it does not match,
treat as not-found and fail/handle accordingly; specifically, after obtaining
old_record from tree_.lower_bound(probe) add an equality check against key and
throw or abort the rollback path when mismatched to prevent rolling back the
wrong entry.
---
Nitpick comments:
In `@cuckoo-filter.H`:
- Around line 149-153: Remove the unused Bucket::swap_random(Fingerprint &fp,
std::mt19937 &rng) implementation to avoid divergence from the existing kick
logic and rollback bookkeeping; ensure all insertion/kick code continues to call
the canonical Bucket::swap_random_slot(...) (used at insertion line referencing
swap_random_slot) and that there are no remaining references to swap_random or
duplicated swap behavior, keeping the swap logic centralized in swap_random_slot
and deleting the swap_random definition and its declaration.
In `@tpl_b_tree.H`:
- Around line 756-760: The move overload insert(Key&&) currently moves into a
local copy and calls insert(const Key&), losing move semantics; fix by either
(A) changing the const-ref overload to accept Key by value (insert(Key key)) and
then move key into the tree/internal helper, and update insert_nonfull to
accept/forward that Key (e.g., take Key&& or Key by value and std::move into
node storage), or (B) implement a proper forwarding path: have insert(Key&&)
call the internal insert_nonfull that accepts Key&& (forwarding the original key
with std::move) instead of creating a local copy; update signatures of
insert_nonfull and any helpers to accept/move the Key so the move is preserved.
Ensure changes reference the existing symbols insert(Key&&), insert(const
Key&)/insert(Key), and insert_nonfull so callers forward/move appropriately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6d1cf07c-5f62-4ada-b615-13bdcbdb8e60
📒 Files selected for processing (3)
cuckoo-filter.Htpl_b_tree.Htpl_file_b_map.H
Rework `swap_random_slot` and `restore` functions for clarity and correctness. Update `tpl_file_b_map.H` with a more robust key equality check in `insert_or_assign`. These changes refactor existing code, improving maintainability and reliability of the filter and map components.
There was a problem hiding this comment.
Pull request overview
This PR adds B-Tree/B+Tree families (in-memory + persistent), introduces new probabilistic filters (Cuckoo/Quotient + compact variants), and updates hashing/default-hash pointer plumbing plus docs/tests/examples to support the new additions.
Changes:
- Added new probabilistic filters (Cuckoo + Quotient, plus compact/bit-packed variants) with accompanying tests and examples.
- Added documentation and README sections describing multiway trees and probabilistic filters.
- Updated several hash-table/tree utilities and examples/scripts to align with new hash pointer APIs and improved usability.
Reviewed changes
Copilot reviewed 62 out of 72 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tpl_dynSetHash.H | Switch default hash function argument to pointer-friendly default. |
| tpl_dynMapOhash.H | Switch default open-hash functions to pointer-friendly defaults. |
| tpl_dynLhash.H | Switch default hash function to pointer-friendly default. |
| scripts/check_examples_run | New Ruby helper to build and execute examples with logging/timeouts. |
| net_utils.H | Strengthen layered network generation invariants (non-empty interior layers, no zero in-degree intermediates). |
| docs/b_tree_family.md | New docs for B-Tree/B+Tree and file-backed variants. |
| cuckoo-filter.H | New Cuckoo filter implementation (probabilistic set membership with deletion). |
| compact-quotient-filter.H | New bit-packed quotient filter implementation. |
| compact-cuckoo-filter.H | New bit-packed cuckoo filter implementation. |
| bloom-filter.H | Improve Bloom filter API contracts/diagnostics and update default hashing. |
| al-matrix.H | Update pair hash defaults to pointer-friendly variants. |
| Tests/quotient-filter.cc | New test suite for Quotient_Filter. |
| Tests/net_utils_test.cc | Add test for rejecting empty intermediate layers. |
| Tests/jsw_init_test.cc | New tests for JSW hash lazy init/reinit behavior. |
| Tests/hash_restriction_test.cc | New compile-time trait tests for hashing restrictions/overloads. |
| Tests/hash_fct_test.cc | Expand hash consistency/distribution tests and update JSW init seed usage. |
| Tests/hash-it.cc | Adjust OLhashTable test to use an explicit constant hash functor. |
| Tests/file_bplus_map_test.cc | New tests for persistent B+Tree map wrapper. |
| Tests/file_b_map_test.cc | New tests for persistent B-Tree map wrapper. |
| Tests/cuckoo-filter.cc | New tests for Cuckoo_Filter behavior and probabilistic expectations. |
| Tests/bplus_tree_test.cc | New tests for BPlus_Tree semantics, iterators, range queries, and invariants. |
| Tests/b_tree_test.cc | New tests for B_Tree semantics, bounds, and invariants. |
| Tests/CMakeLists.txt | Register new filter/tree/hash-related test targets. |
| README.md | Document multiway search trees and probabilistic filters; update tables/sections. |
| README.es.md | Update Spanish README probabilistic/streaming tables for new filters. |
| Examples/timeAllTree.C | Add no-args usage/help output. |
| Examples/testQueue.C | Add no-args help and tighten argument parsing to int. |
| Examples/testAllTree.C | Add no-args usage/help output. |
| Examples/test-rvalues.C | Change node consistency check logic (now tracks seen node infos). |
| Examples/test-odhash.cc | Fix hashing for pair keys; avoid unused-result warnings. |
| Examples/test-con-queue.C | Add no-args usage/help output. |
| Examples/test-comb.C | Adjust example input list size (combination example). |
| Examples/set_structures_benchmark.cc | Increase default benchmark size to 2,000,000. |
| Examples/quotient_filter_example.C | New Quotient filter example program. |
| Examples/ntreepic.C | Add no-args usage/help output. |
| Examples/hungarian_example.cc | Ensure matrix is allocated before populating. |
| Examples/graphpic.C | Add no-args usage/help output. |
| Examples/file_bplus_tree_example.cc | New example for persistent B+Tree. |
| Examples/file_bplus_map_example.cc | New example for persistent B+Tree map. |
| Examples/file_b_tree_example.cc | New example for persistent B-Tree. |
| Examples/file_b_map_example.cc | New example for persistent B-Tree map. |
| Examples/cuckoo_filter_example.cc | New Cuckoo filter example program. |
| Examples/compact_quotient_filter_example.C | New comprehensive compact quotient filter examples. |
| Examples/compact_cuckoo_filter_example.C | New comprehensive compact cuckoo filter examples. |
| Examples/btreepic.C | Add no-args usage/help output. |
| Examples/bplus_tree_example.cc | New example for in-memory B+Tree. |
| Examples/b_tree_example.cc | New example for in-memory B-Tree. |
| CMakeLists.txt | Export new public headers in build/install lists. |
| .github/workflows/ci.yml | Exclude computationally intensive hash statistical test from CI/coverage/asan/tsan runs. |
| .clang-format | Replace IDE-generated config with project-targeted C++20 style rules. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
compact-cuckoo-filter.H (1)
374-439:⚠️ Potential issue | 🟠 MajorShould fix: Doxygen contract tags are incomplete on public methods.
At Line 374 (
append) and Line 436 (clear), required public API tags are incomplete (@param/@returnforappend,@returnforclear).Proposed doc patch
- /** `@brief` Alias for insert(). - * `@throws` none. - */ + /** `@brief` Alias for insert(). + * `@param` item Item to insert. + * `@return` `true` if insertion succeeded, `false` if filter is full. + * `@throws` none. + */ bool append(const T &item) { return insert(item); } - /** `@brief` Clear the filter. - * `@throws` none. - * `@note` O(num_buckets * EntriesPerBucket). - */ + /** `@brief` Clear the filter. + * `@return` none. + * `@throws` none. + * `@note` O(num_buckets * EntriesPerBucket). + */ void clear()As per coding guidelines: “All public member functions must have
@brief,@param,@return, and@throwsDoxygen tags”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compact-cuckoo-filter.H` around lines 374 - 439, Update the Doxygen comments for the public methods to include the required tags: for append(const T &item) add `@param` describing the item parameter, add `@return` describing the boolean result (e.g., true if inserted / false otherwise), and keep `@throws` none; for clear() add an `@return` tag (even if void, state what callers should expect, e.g., void or always returns void) and ensure `@throws` is present; locate the comment blocks for the methods named append and clear and modify their Doxygen blocks to include `@brief`, `@param`, `@return`, and `@throws` per the coding guidelines.cuckoo-filter.H (1)
406-409:⚠️ Potential issue | 🟠 MajorShould fix: add missing
@returntag toclear()Doxygen.At Line 406,
clear()is public and still misses the required@returncontract tag.Proposed doc patch
- /** `@brief` Clear the filter. - * `@throws` none. - * `@note` O(num_buckets * EntriesPerBucket). - */ + /** `@brief` Clear the filter. + * `@return` none. + * `@throws` none. + * `@note` O(num_buckets * EntriesPerBucket). + */ void clear()As per coding guidelines: “All public member functions must have
@brief,@param,@return, and@throwsDoxygen tags”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cuckoo-filter.H` around lines 406 - 409, The Doxygen for the public method clear() is missing the required `@return` tag; update the comment block that documents clear() to include an `@return` entry (e.g., "@return void" or "@return Nothing.") describing that the function returns nothing, keeping the existing `@brief` and `@throws` lines intact to satisfy the public API documentation rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tpl_file_b_map.H`:
- Around line 266-293: Add a precondition and matching throws clause to the two
C-string constructors of Gen_File_B_Map that call validated_path: document "@pre
file_path != nullptr" and add "@throws std::invalid_argument If file_path is
null or otherwise invalid (propagated from validated_path())" in both
constructor docblocks so the contract reflects validated_path()'s possible
errors; update the constructors declared as Gen_File_B_Map(const char *
file_path, const bool auto_sync = true, ...) and Gen_File_B_Map(const char *
file_path, const open_mode_type open_mode, const bool auto_sync = true, ...)
accordingly.
---
Duplicate comments:
In `@compact-cuckoo-filter.H`:
- Around line 374-439: Update the Doxygen comments for the public methods to
include the required tags: for append(const T &item) add `@param` describing the
item parameter, add `@return` describing the boolean result (e.g., true if
inserted / false otherwise), and keep `@throws` none; for clear() add an `@return`
tag (even if void, state what callers should expect, e.g., void or always
returns void) and ensure `@throws` is present; locate the comment blocks for the
methods named append and clear and modify their Doxygen blocks to include
`@brief`, `@param`, `@return`, and `@throws` per the coding guidelines.
In `@cuckoo-filter.H`:
- Around line 406-409: The Doxygen for the public method clear() is missing the
required `@return` tag; update the comment block that documents clear() to include
an `@return` entry (e.g., "@return void" or "@return Nothing.") describing that
the function returns nothing, keeping the existing `@brief` and `@throws` lines
intact to satisfy the public API documentation rules.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aafe1715-2d04-476a-9b50-28ae27b73a32
📒 Files selected for processing (3)
compact-cuckoo-filter.Hcuckoo-filter.Htpl_file_b_map.H
Refine documentation for append, contains, and remove in cuckoo filter. Add argument validation and improve documentation in quotient filter. Clarify documentation in tpl_file_b_map for constructors.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores