Add comprehensive string algorithms (KMP, Z-algorithm, Aho-Corasick, suffix structures, Manacher, edit distance) by lrleon · Pull Request #37 · lrleon/Aleph-w

lrleon · 2026-02-24T03:18:16Z

Summary by CodeRabbit

New Features
- Added a comprehensive string-algorithms suite (multi-pattern Aho‑Corasick, KMP/Z/Horspool/Rabin‑Karp, suffix structures & automaton, Manacher palindromes, Levenshtein/Damerau/LCS), an umbrella header, stable counting/radix sorts, and randomized tree backends.
Documentation
- Expanded README (EN/ES) with a new String Algorithms section and usage examples.
Examples
- Added many example programs demonstrating the new algorithms.
Tests
- Added extensive unit and integration tests covering string algorithms, DP, palindromes, suffix structures, and sorts.

…drome tests

coderabbitai · 2026-02-24T03:18:41Z

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds a comprehensive string-algorithms suite (single- and multi-pattern search, suffix structures, palindromes, edit-distance), an umbrella header, many examples and tests, new counting/radix sorting APIs and concepts (including a duplicated concept), test helpers, minor utility tweaks, and CMake/install list updates.

Changes

Cohort / File(s)	Summary
Aho–Corasick `Aleph/Aho_Corasick.H`	New `Aleph::Aho_Corasick` class with pattern insertion, build (failure links, output propagation), linear-time search over 256-byte alphabet, overlapping-match reporting, `Match` struct, and API error checks.
Single-pattern Search `Aleph/String_Search.H`	Header-only implementations: KMP, Z, Boyer–Moore–Horspool, Rabin–Karp; helper utilities; APIs return all match positions and define empty-pattern semantics.
Suffix Structures `Aleph/Suffix_Structures.H`	Suffix array (doubling), Kasai LCP, naive suffix tree, suffix automaton (256-alphabet), LCS helpers, and public classes/accessors.
String DP & Palindromes `Aleph/String_DP.H`, `Aleph/String_Palindromes.H`	Levenshtein/edit and Damerau–Levenshtein, LCS and LCSubstring result types; Manacher implementation and longest-palindrome helper.
Umbrella & Build `Aleph/String_Algorithms.H`, `CMakeLists.txt`, `Examples/CMakeLists.txt`	New umbrella header aggregating string modules; added string headers to public install/export list; registered many new example programs in Examples build.
Examples `Examples/*_example.cc` (e.g., `Examples/aho_corasick_example.cc`, `Examples/kmp_example.cc`, `Examples/z_algorithm_example.cc`, `Examples/horspool_example.cc`, `Examples/rabin_karp_example.cc`, `Examples/suffix_array_lcp_example.cc`, `Examples/suffix_tree_example.cc`, `Examples/suffix_automaton_example.cc`, `Examples/manacher_example.cc`, `Examples/edit_distance_example.cc`, `Examples/damerau_levenshtein_example.cc`, `Examples/lcs_longest_common_substring_example.cc`)	Twelve+ new example programs demonstrating the new string algorithms and structures.
Tests (String) `Tests/aho_corasick_test.cc`, `Tests/string_search_test.cc`, `Tests/string_dp_test.cc`, `Tests/string_palindromes_test.cc`, `Tests/suffix_structures_test.cc`, `Tests/string_algorithms_test.cc`	Extensive unit and integration tests covering correctness, edge cases, stress scenarios, cross-algorithm consistency, and error handling for the new string modules.
Sorting & Concepts `tpl_sort_utils.H`, `ah-concepts.H`	Added stable counting_sort_indices and broad counting/radix_sort overloads (containers, raw/fixed arrays); introduced `CountingSortable` and `RadixSortable` concepts; added `IntegerSortableValue` concept — duplicate declaration present in `ah-concepts.H`.
Sort Tests & Helpers `Tests/sort_utils.cc`, `Tests/test_helpers.H`	Large test additions for counting/radix sorts across containers and pointer variants; new test helper `expect_array_eq`.
String Utils Tests `Tests/ah-string-utils.cc`	Expanded string-utils tests and helpers (split/format/justify/numeric parsing edge cases and stress scenarios).
Misc. Code & Docs `ah-dry.H`, `ah-string-utils.H`, `README.md`, `README.es.md`	Defaulted `SpecialCtors` copy-assignment, tightened numeric parsing (uses std::isfinite), improved `concat` implementation, README updates adding String Algorithms and `Rand_Tree` entries, expanded Spanish README.
Performance Hints & Traversal Simplification `tpl_avl.H`, `tpl_avlRk.H`, `tpl_binNodeUtils.H`	Added prefetch intrinsics to warm child pointers and simplified traversal loops in tree/bin-node utilities.
Test Infrastructure `Tests/*` (many)	New and extended test suites across sorting, string algorithms, palindromes, suffix structures and utilities; many stress/regression cases added.

Sequence Diagram(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add Cartesian Tree RMQ/LCA implementation with concepts #32 — Modifies ah-concepts.H; directly related to the IntegerSortableValue concept addition/duplication in this change.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and comprehensively summarizes the main additions: KMP, Z-algorithm, Aho-Corasick, suffix structures (suffix array/tree/automaton), Manacher's algorithm, and edit distance functions. It clearly reflects the primary intent of this substantial feature addition.
Header Docstring Coverage	✅ Passed	Inspection confirms 97.1% Doxygen documentation coverage (33/34 declarations), significantly exceeding the required 80% threshold across all six string algorithm headers.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This pull request introduces comprehensive string algorithm functionality to the Aleph-w library, including pattern matching, suffix structures, palindrome detection, and edit distance algorithms.

Changes:

Added new string algorithm headers: String_Search.H, Aho_Corasick.H, Suffix_Structures.H, String_Palindromes.H, String_DP.H, and umbrella header String_Algorithms.H
Enhanced sorting utilities with counting sort and radix sort implementations for integer types
Added integer sorting concept IntegerSortableValue to ah-concepts.H to enforce valid types for linear-time integer sorting
Fixed spelling and grammatical errors in comments across multiple files

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tpl_sort_utils.H	Added counting_sort_indices, counting_sort, and radix_sort implementations with support for multiple container types
ah-string-utils.H	Fixed typo in comment ("sringficated" → "stringified")
ah-dry.H	Corrected multiple grammatical errors in comments and changed copy assignment operator to default
ah-concepts.H	Added IntegerSortableValue concept for type-safe integer sorting
Tests/suffix_structures_test.cc	New test suite for suffix array, LCP, suffix tree, and suffix automaton
Tests/string_search_test.cc	New test suite for KMP, Z-algorithm, Horspool, and Rabin-Karp
Tests/string_palindromes_test.cc	New test suite for Manacher's algorithm
Tests/string_dp_test.cc	New test suite for edit distance and LCS algorithms
Tests/string_algorithms_test.cc	Integration tests for string algorithm interoperability
Tests/sort_utils.cc	Added comprehensive tests for counting sort and radix sort
Tests/aho_corasick_test.cc	New test suite for Aho-Corasick multi-pattern matching
Suffix_Structures.H	New header with suffix array, LCP, suffix tree, and suffix automaton implementations
String_Search.H	New header with KMP, Z-algorithm, Horspool, and Rabin-Karp implementations
String_Palindromes.H	New header with Manacher's algorithm
String_DP.H	New header with Levenshtein, Damerau-Levenshtein, LCS, and longest common substring
String_Algorithms.H	Umbrella header including all string algorithm headers
README.md	Updated documentation with string algorithms section and examples
README.es.md	Updated Spanish documentation with string algorithms references
Examples/*.cc	Added 12 new example files demonstrating string algorithms
Examples/min_cost_matching_example.cc	Added const qualifier to loop variable
Examples/CMakeLists.txt	Registered new string algorithm examples
CMakeLists.txt	Added new string algorithm headers to build system
Aho_Corasick.H	New header with Aho-Corasick multi-pattern matching implementation

Comments suppressed due to low confidence (2)

ah-string-utils.H:1

Corrected spelling of 'sringficated' to 'stringified'.
ah-dry.H:1
The copy assignment operator should return *this when not defaulted. Since it's now defaulted, this is correct, but if it were custom, it should follow the pattern of returning *this.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aec96eee49

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Aho_Corasick.H

Suffix_Structures.H

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Tests/sort_utils.cc (1)
40-122: ⚠️ Potential issue | 🟡 Minor

Use std::ranges::sort instead of std::sort for consistency with coding guidelines.

The file uses std::sort at multiple locations (lines 110, 120, 273, 274, 303, 304, 313, 324, 460, etc.). Per the coding guidelines, prefer std::ranges:: algorithms over classic std:: algorithms. Replace these calls with std::ranges::sort.

Note: The std::vector usage with reserve() is compliant with guidelines (e.g., lines 254–275 use reserve() when output size is known a priori, which is the documented exception). However, the helper functions at lines 105–122 that return std::vector<uintptr_t> without reserve() could be migrated to DynArray<uintptr_t> for consistency, if feasible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/sort_utils.cc` around lines 40 - 122, The PR uses std::sort in multiple
places; update all calls to use std::ranges::sort instead (e.g., replace
std::sort(ret.begin(), ret.end()) in dynlist_node_addresses and
dyndlist_node_addresses and every other occurrence). Ensure the file includes
the proper headers for ranges-algorithms if needed and keep the same iterator
arguments (std::ranges::sort accepts the same iterator ranges). Optionally, for
dynlist_node_addresses and dyndlist_node_addresses consider returning
DynArray<uintptr_t> (or reserve the std::vector before push_back) to match
project conventions; if you switch to DynArray, adapt push_back logic to
indexing and reserve appropriately.

🧹 Nitpick comments (7)

Tests/aho_corasick_test.cc (1)
45-216: Solid test suite with good coverage of the Aho-Corasick API.

The tests cover the classic dictionary example, overlapping patterns, error handling (pre-build search, empty pattern), lifecycle (clear/rebuild), large pattern sets, long texts, single-char patterns, and binary data.

Consider adding two small edge-case tests for completeness:

Empty text — ac.search("") should return no matches.

No matches found — search a text that contains none of the registered patterns.

These would round out the boundary coverage.

,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/aho_corasick_test.cc` around lines 45 - 216, Add two small unit tests
to Tests/aho_corasick_test.cc: implement TEST(AhoCorasick, EmptyText) that
creates Aho_Corasick, adds a couple patterns via add_pattern, calls build(),
calls search("") and asserts no matches (matches.size()==0 or matches.empty()),
and implement TEST(AhoCorasick, NoMatchesFound) that adds patterns, builds,
searches a text with none of the patterns (e.g., "xyzxyz"), and asserts no
matches; reference Aho_Corasick, add_pattern, build, and search in the new tests
and use the existing EXPECT_EQ/EXPECT_TRUE assertions style.
Examples/z_algorithm_example.cc (1)
52-62: Prefer std::format and a ranges-based printer helper.

Line 52-62 uses chained iostreams and two index loops. The guidelines prefer std::format and algorithmic/range-based iteration, and a small helper also removes duplicate printing logic.
Proposed refactor
-# include <iostream>
-# include <string>
+# include <algorithm>
+# include <format>
+# include <iostream>
+# include <string>
@@
-  std::cout << "Z-Algorithm Example\n";
-  std::cout << "Text   : " << text << "\n";
-  std::cout << "Pattern: " << pattern << "\n\n";
+  std::cout << std::format("Z-Algorithm Example\nText   : {}\nPattern: {}\n\n",
+                           text, pattern);
+
+  const auto print_seq = [](const auto& seq) {
+    bool first = true;
+    std::ranges::for_each(seq, [&](const auto& v) {
+      if (!first) std::cout << ' ';
+      std::cout << v;
+      first = false;
+    });
+    std::cout << '\n';
+  };
@@
-  std::cout << "Z-array: ";
-  for (size_t i = 0; i < z.size(); ++i)
-    std::cout << z[i] << (i + 1 == z.size() ? '\n' : ' ');
+  std::cout << "Z-array: ";
+  print_seq(z);
@@
-  std::cout << "Matches at positions: ";
-  for (size_t i = 0; i < matches.size(); ++i)
-    std::cout << matches[i] << (i + 1 == matches.size() ? '\n' : ' ');
+  std::cout << "Matches at positions: ";
+  print_seq(matches);
As per coding guidelines “Use std::format for type-safe string formatting (prefer over printf/streams)”, “Use range-based for loops for simple iterations”, and “Use algorithms over hand-rolled loops for readability and correctness”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Examples/z_algorithm_example.cc` around lines 52 - 62, Replace the
iostream-based printing with std::format and a small ranges-based printer
helper: create a helper function (e.g., print_range) that takes a range (used
for z and matches) and a separator/terminator, then use std::format to produce
the header lines for "Z-Algorithm Example", "Text", and "Pattern" (referencing
variables text and pattern) and call print_range(z, ' ') and
print_range(matches, ' ') to print the sequences; also use range-based iteration
or std::ranges algorithms inside the helper to avoid manual index loops and
ensure consistent spacing/termination.
Examples/rabin_karp_example.cc (1)
51-57: Prefer std::format + ranges-based iteration for output.

Line 51-57 uses chained iostreams and an index loop. The C++ guidelines prefer std::format and algorithmic/range-based loops for simple iterations. Also, the current loop does not emit a newline when matches is empty.
Proposed refactor
-# include <iostream>
-# include <string>
+# include <algorithm>
+# include <format>
+# include <iostream>
+# include <string>
@@
-  std::cout << "Rabin-Karp Example\n";
-  std::cout << "Text   : " << text << "\n";
-  std::cout << "Pattern: " << pattern << "\n\n";
+  std::cout << std::format("Rabin-Karp Example\nText   : {}\nPattern: {}\n\n",
+                           text, pattern);
@@
-  std::cout << "Matches at positions: ";
-  for (size_t i = 0; i < matches.size(); ++i)
-    std::cout << matches[i] << (i + 1 == matches.size() ? '\n' : ' ');
+  std::cout << "Matches at positions: ";
+  bool first = true;
+  std::ranges::for_each(matches, [&](const auto& pos) {
+    if (!first) std::cout << ' ';
+    std::cout << pos;
+    first = false;
+  });
+  std::cout << '\n';
As per coding guidelines “Use std::format for type-safe string formatting (prefer over printf/streams)”, “Use range-based for loops for simple iterations”, and “Use algorithms over hand-rolled loops for readability and correctness”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Examples/rabin_karp_example.cc` around lines 51 - 57, Replace the chained
iostream output with type-safe std::format-based messages for the header and the
text/pattern lines (use std::format and include <format>), and replace the index
loop over matches with a range-based iteration (for (auto idx : matches)) or use
a join-style/algorithmic approach to print space-separated positions; ensure you
emit a newline even when matches is empty by printing a terminating '\n' after
the sequence or handling the empty case explicitly; reference variables: text,
pattern, matches.
Tests/string_search_test.cc (2)
168-198: Consider strengthening the planted-pattern assertion from GE to EQ.

Since the text is all 'a' characters and "XYZW" contains none of {a}, there cannot be any spurious matches. Using EXPECT_EQ(kmp.size(), 6u) would make the test stricter and catch false positives from the algorithms.
Proposed fix
-  // Must find at least the planted positions
-  EXPECT_GE(kmp.size(), 6u);
+  // Exactly the planted positions (no character in "XYZW" matches the 'a' background)
+  EXPECT_EQ(kmp.size(), 6u);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/string_search_test.cc` around lines 168 - 198, The test uses
EXPECT_GE(kmp.size(), 6u) which allows extra (incorrect) matches; change that
assertion to EXPECT_EQ(kmp.size(), 6u) in the TEST(StringSearch,
StressLongRandomStringAllAlgorithmsAgree) so the number of matches reported by
kmp_search (and by earlier cross-checks with z_search,
boyer_moore_horspool_search, rabin_karp_search) must equal the six planted
positions; update the single EXPECT_GE line to EXPECT_EQ and keep the rest of
the equality checks as-is to ensure strict agreement.
43-54: expect_array_eq helper is duplicated across test files.

This identical helper also appears in Tests/suffix_structures_test.cc. Consider extracting it into a shared test utility header (e.g., Tests/test_helpers.H) to reduce duplication across the new string test suites.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/string_search_test.cc` around lines 43 - 54, Extract the duplicated
expect_array_eq helper into a shared header (e.g., Tests/test_helpers.H): move
the implementation of expect_array_eq(const Array<size_t>&,
std::initializer_list<size_t>) out of the anonymous namespace and into that
header, ensure the header includes the necessary headers (gtest, Array
definition, <initializer_list>), and make the function either inline or placed
in a named namespace (e.g., test_helpers::expect_array_eq) to avoid ODR/multiple
definition issues; then include Tests/test_helpers.H from both
Tests/string_search_test.cc and Tests/suffix_structures_test.cc and remove the
local duplicate definitions.
Tests/suffix_structures_test.cc (1)
184-203: Use EXPECT_LT instead of EXPECT_LE for suffix-array ordering verification.

All suffixes of a string are distinct (they have different lengths), so the suffix array must be strictly ordered. EXPECT_LE masks a potential bug where the SA contains duplicate indices or compares equal erroneously. The same applies to the SuffixArrayLongerText test at Line 127, which correctly uses EXPECT_LT.
Proposed fix
-      EXPECT_LE(prev_suffix, curr_suffix);
+      EXPECT_LT(prev_suffix, curr_suffix);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/suffix_structures_test.cc` around lines 184 - 203, In the TEST named
StressLongerSuffixArray replace the non-strict comparison EXPECT_LE(prev_suffix,
curr_suffix) with EXPECT_LT(prev_suffix, curr_suffix) to assert strict ordering
of distinct suffixes; locate the assertion in the test that builds text,
computes sa via suffix_array(text), and compares prev_suffix and curr_suffix
(variables prev_suffix, curr_suffix, and sa) and change to EXPECT_LT, mirroring
the existing behavior used in the SuffixArrayLongerText test.
Tests/string_palindromes_test.cc (1)
107-112: Test name ManacherPalindromeInMiddle is slightly misleading.

"xxxracecarxxx" is itself a palindrome (symmetric around center), so the test validates whole-string detection rather than a palindrome embedded in non-palindromic surroundings. Consider renaming or using a string like "abcracecarxyz" if the intent is to test a palindrome strictly in the middle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/string_palindromes_test.cc` around lines 107 - 112, The test named
ManacherPalindromeInMiddle uses "xxxracecarxxx" which is a full-string
palindrome; either rename the TEST to reflect whole-string detection (e.g.,
ManacherWholeStringPalindrome) or change the input to a true embedded case like
"abcracecarxyz" and update expectations accordingly (call
manacher("abcracecarxyz") and assert r.longest_palindrome == "racecar" and
r.longest_length == 7u); update the TEST identifier or the string and expected
values in Tests/string_palindromes_test.cc, referencing the TEST declaration and
the manacher(...) result fields r.longest_palindrome and r.longest_length.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Aho_Corasick.H`:
- Around line 120-129: Update the Doxygen blocks for the public methods
add_pattern, search, and pattern to include `@throws` tags documenting the
exceptions raised by the Aleph error macros; for example, add_pattern should
include `@throws` std::domain_error (or the specific Aleph exception) for
empty-pattern insertion, search should document exceptions it can raise when the
automaton is not built or on invalid input, and pattern (accessor) should list
exceptions for out-of-range or invalid id access—ensure each public API has
`@brief`, `@param`, `@return`, and a clear `@throws` line describing the exception type
and the condition that triggers it.

In `@README.es.md`:
- Around line 505-513: Reemplaza los términos en inglés por equivalentes en
español para mantener consistencia: cambia "Headers principales:" por "Cabeceras
principales:" (o "Encabezados principales:") y reemplaza "include paraguas" por
una expresión en español como "archivo paraguas" o "cabecera paraguas" en la
línea que lista `String_Algorithms.H`; conserva los nombres de archivos
(`String_Search.H`, `Aho_Corasick.H`, `Suffix_Structures.H`,
`String_Palindromes.H`, `String_DP.H`, `String_Algorithms.H`) tal cual y
actualiza solo las etiquetas descriptivas alrededor de esos símbolos.

In `@Suffix_Structures.H`:
- Around line 401-739: Add complete Doxygen for each public API by augmenting
existing brief comments for Naive_Suffix_Tree methods (contains, find_all,
node_count, text_size, nodes, build, constructor) and Suffix_Automaton methods
(extend, build, contains, distinct_substring_count, longest_common_substring,
state_count, states, clear, mark_terminals if made public) to include `@param` for
each parameter (e.g., text, pattern, other), `@return` describing return values
(including empty/edge cases), `@throws` for any precondition failures or
exceptions (or state that none are thrown), and `@par` Complexity with Big-O
notation; keep descriptions concise and reference semantics in the existing
comments (e.g., mention original_size_ for text_size, that find_all returns
sorted suffix positions and behavior on empty pattern), ensuring each public
method has `@brief`, `@param`, `@return`, `@throws`, and complexity entries added to its
docblock.

In `@tpl_sort_utils.H`:
- Around line 4539-4576: Remove the std::vector public overloads: delete or move
counting_sort(std::vector<T>&) and its helper
counting_sort_impl(std::vector<IntT>&) out of the core public header so the core
API only exposes Aleph containers; instead implement the sorting logic as (or
reuse) counting_sort_impl that accepts Aleph containers (e.g., DynArray<IntT>&
or Array<IntT>&) and update callers to use the Aleph adapter (or provide a
separate adapter-layer function converting std::vector to DynArray outside the
core header) so the core header no longer exposes std::vector in its public API.
- Around line 4358-4364: The template parameter KeyFn on counting_sort_indices
must be constrained to ensure it is callable with a size_t and returns something
convertible to int; update the template declaration for counting_sort_indices to
add a C++20 requires clause or concept that enforces std::invocable<KeyFn,
size_t> and that std::invoke_result_t<KeyFn, size_t> is std::convertible_to<int>
(or create a small concept alias like KeyFnForIndex). This change targets the
counting_sort_indices template signature so the compiler enforces the callable
input type and return convertibility at compile time.

---

Outside diff comments:
In `@Tests/sort_utils.cc`:
- Around line 40-122: The PR uses std::sort in multiple places; update all calls
to use std::ranges::sort instead (e.g., replace std::sort(ret.begin(),
ret.end()) in dynlist_node_addresses and dyndlist_node_addresses and every other
occurrence). Ensure the file includes the proper headers for ranges-algorithms
if needed and keep the same iterator arguments (std::ranges::sort accepts the
same iterator ranges). Optionally, for dynlist_node_addresses and
dyndlist_node_addresses consider returning DynArray<uintptr_t> (or reserve the
std::vector before push_back) to match project conventions; if you switch to
DynArray, adapt push_back logic to indexing and reserve appropriately.

---

Nitpick comments:
In `@Examples/rabin_karp_example.cc`:
- Around line 51-57: Replace the chained iostream output with type-safe
std::format-based messages for the header and the text/pattern lines (use
std::format and include <format>), and replace the index loop over matches with
a range-based iteration (for (auto idx : matches)) or use a
join-style/algorithmic approach to print space-separated positions; ensure you
emit a newline even when matches is empty by printing a terminating '\n' after
the sequence or handling the empty case explicitly; reference variables: text,
pattern, matches.

In `@Examples/z_algorithm_example.cc`:
- Around line 52-62: Replace the iostream-based printing with std::format and a
small ranges-based printer helper: create a helper function (e.g., print_range)
that takes a range (used for z and matches) and a separator/terminator, then use
std::format to produce the header lines for "Z-Algorithm Example", "Text", and
"Pattern" (referencing variables text and pattern) and call print_range(z, ' ')
and print_range(matches, ' ') to print the sequences; also use range-based
iteration or std::ranges algorithms inside the helper to avoid manual index
loops and ensure consistent spacing/termination.

In `@Tests/aho_corasick_test.cc`:
- Around line 45-216: Add two small unit tests to Tests/aho_corasick_test.cc:
implement TEST(AhoCorasick, EmptyText) that creates Aho_Corasick, adds a couple
patterns via add_pattern, calls build(), calls search("") and asserts no matches
(matches.size()==0 or matches.empty()), and implement TEST(AhoCorasick,
NoMatchesFound) that adds patterns, builds, searches a text with none of the
patterns (e.g., "xyzxyz"), and asserts no matches; reference Aho_Corasick,
add_pattern, build, and search in the new tests and use the existing
EXPECT_EQ/EXPECT_TRUE assertions style.

In `@Tests/string_palindromes_test.cc`:
- Around line 107-112: The test named ManacherPalindromeInMiddle uses
"xxxracecarxxx" which is a full-string palindrome; either rename the TEST to
reflect whole-string detection (e.g., ManacherWholeStringPalindrome) or change
the input to a true embedded case like "abcracecarxyz" and update expectations
accordingly (call manacher("abcracecarxyz") and assert r.longest_palindrome ==
"racecar" and r.longest_length == 7u); update the TEST identifier or the string
and expected values in Tests/string_palindromes_test.cc, referencing the TEST
declaration and the manacher(...) result fields r.longest_palindrome and
r.longest_length.

In `@Tests/string_search_test.cc`:
- Around line 168-198: The test uses EXPECT_GE(kmp.size(), 6u) which allows
extra (incorrect) matches; change that assertion to EXPECT_EQ(kmp.size(), 6u) in
the TEST(StringSearch, StressLongRandomStringAllAlgorithmsAgree) so the number
of matches reported by kmp_search (and by earlier cross-checks with z_search,
boyer_moore_horspool_search, rabin_karp_search) must equal the six planted
positions; update the single EXPECT_GE line to EXPECT_EQ and keep the rest of
the equality checks as-is to ensure strict agreement.
- Around line 43-54: Extract the duplicated expect_array_eq helper into a shared
header (e.g., Tests/test_helpers.H): move the implementation of
expect_array_eq(const Array<size_t>&, std::initializer_list<size_t>) out of the
anonymous namespace and into that header, ensure the header includes the
necessary headers (gtest, Array definition, <initializer_list>), and make the
function either inline or placed in a named namespace (e.g.,
test_helpers::expect_array_eq) to avoid ODR/multiple definition issues; then
include Tests/test_helpers.H from both Tests/string_search_test.cc and
Tests/suffix_structures_test.cc and remove the local duplicate definitions.

In `@Tests/suffix_structures_test.cc`:
- Around line 184-203: In the TEST named StressLongerSuffixArray replace the
non-strict comparison EXPECT_LE(prev_suffix, curr_suffix) with
EXPECT_LT(prev_suffix, curr_suffix) to assert strict ordering of distinct
suffixes; locate the assertion in the test that builds text, computes sa via
suffix_array(text), and compares prev_suffix and curr_suffix (variables
prev_suffix, curr_suffix, and sa) and change to EXPECT_LT, mirroring the
existing behavior used in the SuffixArrayLongerText test.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76a766e and aec96ee.

📒 Files selected for processing (34)

Aho_Corasick.H
CMakeLists.txt
Examples/CMakeLists.txt
Examples/aho_corasick_example.cc
Examples/damerau_levenshtein_example.cc
Examples/edit_distance_example.cc
Examples/horspool_example.cc
Examples/kmp_example.cc
Examples/lcs_longest_common_substring_example.cc
Examples/manacher_example.cc
Examples/min_cost_matching_example.cc
Examples/rabin_karp_example.cc
Examples/suffix_array_lcp_example.cc
Examples/suffix_automaton_example.cc
Examples/suffix_tree_example.cc
Examples/z_algorithm_example.cc
README.es.md
README.md
String_Algorithms.H
String_DP.H
String_Palindromes.H
String_Search.H
Suffix_Structures.H
Tests/aho_corasick_test.cc
Tests/sort_utils.cc
Tests/string_algorithms_test.cc
Tests/string_dp_test.cc
Tests/string_palindromes_test.cc
Tests/string_search_test.cc
Tests/suffix_structures_test.cc
ah-concepts.H
ah-dry.H
ah-string-utils.H
tpl_sort_utils.H

Aho_Corasick.H

README.es.md

Suffix_Structures.H

tpl_sort_utils.H

Refactor string utility functions, improve `Aho_Corasick` and `Suffix_Structures` documentation and robustness. Introduce `test_helpers.H` for better test readability. Add new stress tests for `Manacher` and `Rabin-Karp` algorithms to ensure correctness with complex and adversarial inputs. Update examples to use modern C++ formatting.

coderabbitai · 2026-02-24T17:06:41Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #38

@lrleon

Docstrings generation was requested by @lrleon. * #37 (comment) The following files were modified: * `Examples/damerau_levenshtein_example.cc` * `Examples/edit_distance_example.cc` * `Examples/horspool_example.cc` * `Examples/kmp_example.cc` * `Examples/lcs_longest_common_substring_example.cc` * `Examples/manacher_example.cc` * `Examples/rabin_karp_example.cc` * `Examples/suffix_array_lcp_example.cc` * `Examples/suffix_automaton_example.cc` * `Examples/suffix_tree_example.cc` * `Examples/z_algorithm_example.cc` * `Tests/ah-string-utils.cc` * `Tests/sort_utils.cc`

coderabbitai

Actionable comments posted: 8

♻️ Duplicate comments (2)

README.es.md (1)

505-513: Past review suggestions addressed — "Cabeceras principales" and "cabecera paraguas" are now in place.

The Spanish terminology is consistent. The LanguageTool punctuation warnings on the colon-prefixed markdown list items are false positives for this formatting style.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.es.md` around lines 505 - 513, The review indicates the Spanish README
update already addresses previous suggestions and that the terminology and
punctuation are acceptable; no code changes are necessary—keep the section
headings and items as written (String_Search.H, Aho_Corasick.H,
Suffix_Structures.H, String_Palindromes.H, String_DP.H, String_Algorithms.H) and
proceed to merge/approve the PR.

tpl_sort_utils.H (1)

4358-4364: Constrain KeyFn with C++20 concepts (std::invocable) for clearer compile-time guarantees.

counting_sort_indices relies on key_of(size_t) returning an int-like key. Constraining KeyFn improves diagnostics and prevents accidental misuse.

Proposed change

-template <class KeyFn>
+template <std::invocable<size_t> KeyFn>
+  requires std::convertible_to<std::invoke_result_t<KeyFn, size_t>, int>
   void counting_sort_indices(Array<size_t> & sa,
                              Array<size_t> & tmp,
                              const size_t n,
                              const int min_key,
                              const int max_key,
                              const KeyFn & key_of)

#!/bin/bash
set -euo pipefail

# Verify whether <concepts> / std::invocable helpers are available via current includes/toolchain
fd -a '^ah-concepts\.H$' . -x sed -n '1,120p' {}
fd -a '^tpl_sort_utils\.H$' . -x rg -n --no-heading -C2 'template <class KeyFn>|counting_sort_indices' {}

As per coding guidelines: Use concepts (C++20) to constrain template parameters; avoid SFINAE when possible

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tpl_sort_utils.H` around lines 4358 - 4364, Constrain the KeyFn template
using C++20 concepts: add the necessary includes (<concepts> and <type_traits>
or <utility>) and change the template declaration for counting_sort_indices to
require that KeyFn is invocable with a size_t and that its result is convertible
to int (e.g., use requires std::invocable<KeyFn, size_t> &&
std::convertible_to<std::invoke_result_t<KeyFn, size_t>, int>). This keeps the
function signature and body (counting_sort_indices, parameters
sa/tmp/n/min_key/max_key/key_of) intact but enforces at compile time the
expected callable shape and return type.

🧹 Nitpick comments (6)

Aho_Corasick.H (2)

80-92: Consider using std::array<int, 256> and fill instead of C-style array with manual loop.

Per coding guidelines, prefer std::array when size is known at compile-time.

♻️ Suggested change

+    `#include` <array>
+
     struct Node
     {
-      int next[256];
+      std::array<int, 256> next;
       int fail = 0;
       Array<size_t> output;
 
-      Node()
-      {
-        for (size_t i = 0; i < 256; ++i)
-          next[i] = -1;
-      }
+      Node() : next()
+      {
+        next.fill(-1);
+      }
     };

As per coding guidelines, "Use std::array instead of C-style arrays when size is known at compile-time."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Aho_Corasick.H` around lines 80 - 92, The Node struct currently uses a
C-style int next[256] and manual loop in Node() to initialize it; change next to
a std::array<int, 256> (keep the name Node::next) and replace the manual
initialization loop with std::fill or std::array's fill method in the
constructor (or initialize inline with {-1} if preferred) so the fixed-size
buffer uses std::array and is zero/ -1-initialized per guidelines; update any
access to next[i] to continue working unchanged.

255-258: contains_any performs a full search instead of short-circuiting on first match.

The current implementation collects all matches via search() just to check if any exist. For large texts with many matches this is wasteful. A dedicated early-exit loop would be more efficient.

♻️ Proposed optimization

     [[nodiscard]] bool contains_any(const std::string_view text) const
     {
-      return not search(text).is_empty();
+      ah_runtime_error_unless(built_)
+        << "Aho_Corasick::contains_any(): call build() before contains_any()";
+
+      size_t state = 0;
+      for (size_t i = 0; i < text.size(); ++i)
+        {
+          const unsigned char c = static_cast<unsigned char>(text[i]);
+          state = static_cast<size_t>(nodes_[state].next[c]);
+          if (not nodes_[state].output.is_empty())
+            return true;
+        }
+      return false;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Aho_Corasick.H` around lines 255 - 258, contains_any currently calls
search(text).is_empty(), which forces a full search; change it to iterate the
search results and return true on the first match to enable early exit. Replace
the body of contains_any to obtain the search result (via search(text)) and loop
over its matches (using the result's iterator/next/for_each style API) returning
true immediately when a match is seen and false if the iterator finishes;
reference contains_any and search so you locate and modify the existing
implementation.

Tests/string_search_test.cc (1)

45-53: Duplicated expect_array_eq — prefer the shared helper from test_helpers.H.

This local definition is identical to the one in Tests/test_helpers.H. Using the shared version avoids divergence if the helper is updated later.

♻️ Suggested change

Replace the local anonymous namespace helper with:

 # include <String_Search.H>
+# include "test_helpers.H"
 
 using namespace Aleph;
+using namespace Aleph::test_helpers;
 
-namespace
-{
-  void expect_array_eq(const Array<size_t> & a,
-                       std::initializer_list<size_t> expected)
-  {
-    ASSERT_EQ(a.size(), expected.size());
-    size_t i = 0;
-    for (const size_t value : expected)
-      EXPECT_EQ(a[i++], value);
-  }
-}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Tests/string_search_test.cc` around lines 45 - 53, The test defines a local
duplicate helper expect_array_eq(const Array<size_t>&,
std::initializer_list<size_t>) that is identical to the shared one in
Tests/test_helpers.H; remove the local anonymous-namespace definition of
expect_array_eq from Tests/string_search_test.cc and instead include or import
the shared helper by adding the appropriate include for test_helpers.H (or using
the existing test helper header) so the test uses the single canonical
expect_array_eq implementation.

Suffix_Structures.H (1)

268-283: Consider allowing '\0' as a terminal marker (or clarify why it’s excluded).

choose_terminal() only considers byte values [1..255]. For binary inputs containing all bytes 1..255 but not 0, construction currently throws even though 0 would be available. If excluding 0 is intentional (e.g., to avoid C-string pitfalls elsewhere), it’d be good to document that rationale explicitly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Suffix_Structures.H` around lines 268 - 283, The choose_terminal function
currently skips byte value 0 and throws if 1..255 are present; either allow '\0'
by including byte 0 in the search or explicitly document/guard the decision. To
fix, update choose_terminal (the function choose_terminal in Naive_Suffix_Tree /
Suffix_Structures.H) to consider used[0] as a valid terminal (start the for-loop
at i = 0) when '\0' is acceptable, or if '\0' must be excluded keep the current
loop but add a clear comment and/or an assertion explaining why '\0' is
forbidden and how callers must sanitize inputs; adjust error message accordingly
to reflect whether only 1..255 were considered or all 256 bytes.

Tests/aho_corasick_test.cc (1)

127-187: Consider a larger deterministic stress case as a performance-regression guard.

LargePatternSet/LongTextSearch are a good start, but a “stress” test (e.g., a few thousand patterns + a few hundred KB of text) can better catch accidental O(n·m) regressions while staying deterministic (avoid wall-clock timing asserts).
Based on learnings: Applies to Tests/**/*_test.cc : Include performance regression tests for critical algorithms
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/aho_corasick_test.cc` around lines 127 - 187, Add a deterministic
stress test (e.g., TEST(AhoCorasick, DeterministicStress)) in
Tests/aho_corasick_test.cc that programmatically generates a few thousand
patterns (use a fixed algorithm/seed to produce unique strings via
Aho_Corasick::add_pattern), calls Aho_Corasick::build(), constructs a
few-hundred-KB deterministic text that contains known inserted patterns at fixed
offsets, runs Aho_Corasick::search(), and asserts correctness by checking match
counts and verifying each match via Aho_Corasick::pattern(match.pattern_id) and
text.substr(match.position, pat.size()) == pat; avoid any wall-clock timing
asserts so this serves as a deterministic performance-regression guard.

tpl_sort_utils.H (1)

4642-4685: Add requires radix_integer<IntT> to both radix_sort_impl overloads for consistency.

The radix_sort_impl(DynList<IntT>&) and radix_sort_impl(DynDlist<IntT>&) overloads already have requires radix_integer<IntT>, but the generic radix_sort_impl(C<IntT>&) and radix_sort_impl(IntT*, size_t) overloads do not. This mirrors the pattern in counting_sort_impl, where all overloads are constrained. Adding the constraint provides better compile-time error messages if these impl functions are reused elsewhere and aligns with the coding guideline to use concepts for template parameter constraints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tpl_sort_utils.H` around lines 4642 - 4685, The two unconstrained template
overloads radix_sort_impl(C<IntT>&) and radix_sort_impl(IntT*, size_t) must be
constrained with requires radix_integer<IntT> to match the other overloads (the
DynList/DynDlist variants) and provide clearer compile-time errors; update the
template declarations for the generic container overload (template <template
<typename> class C, typename IntT>) and the raw-pointer overload (template
<typename IntT>) to add requires radix_integer<IntT>.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ah-string-utils.H`:
- Around line 518-523: The boolean expression initializing const bool invalid
mixes C-style tokens (||, &&, !) with alternative tokens (or, and, not); update
the expression in the invalid initializer (the const bool invalid in this file)
to use the alternative tokens consistently (replace || with or, && with and, and
! with not) so all logical operations use and/or/not and preserve the same
grouping and semantics.

In `@Suffix_Structures.H`:
- Around line 76-89: The current insertion_sort(Array<size_t>& a) in
Suffix_Structures.H is O(k²) and is used by Naive_Suffix_Tree::find_all() via
insertion_sort(matches), conflicting with the documented O(k log k); replace the
hand-rolled insertion_sort implementation with the existing O(k log k) sorter
(e.g., call quicksort_op(matches) from tpl_sort_utils.H or std::sort on the
underlying container) so find_all() uses an O(k log k) sort; update the function
body of insertion_sort (and any direct callers like Naive_Suffix_Tree::find_all)
to invoke the chosen efficient sort to preserve API but ensure O(k log k)
complexity.

In `@Tests/aho_corasick_test.cc`:
- Around line 169-174: The comment above the test in LongTextSearch is
inaccurate: it says the text is built with "hay" repeated but the loop appends
"stack" 2000 times; update the comment to accurately describe the test data
(e.g., "Build a 10K text with 'stack' repeated and a few 'needle' insertions")
so readers aren’t misled by the mismatch between the comment and the code in
Tests/aho_corasick_test.cc within the LongTextSearch test.
- Around line 45-62: In the TEST named ClassicDictionary, change the size
assertion on the search result from EXPECT_EQ(matches.size(), 4u) to
ASSERT_EQ(matches.size(), 4u) so the test aborts immediately if matches has the
wrong length and prevents out-of-bounds indexing of matches[0]..matches[3];
locate the Aho_Corasick ac build and const auto matches = ac.search("ahishers")
lines and update that single assertion accordingly.

In `@Tests/sort_utils.cc`:
- Around line 1637-1647: The test named
SortUtilsCountingSortGeneric.std_vector_support is misleading because it uses
make_dynarray (DynArray) instead of std::vector; either rename the test to
reflect DynArray usage (e.g., change the TEST name from std_vector_support to
dynarray_support) or add a separate test that constructs a std::vector (e.g.,
std::vector<int> v = {9, -1, 3, 0, -1, 2}; counting_sort(v); ...) to exercise
the std::vector overload of counting_sort; update assertions to target the new
variable names accordingly and keep the existing assertions for ordering and
size.

In `@Tests/test_helpers.H`:
- Around line 1-22: Add the standard MIT license header and a Doxygen file-level
comment with an `@file` directive and brief description to the top of
test_helpers.H (above the include guard), matching the format used in other .H
headers; keep the existing include guard and namespace/expect_array_eq
implementation intact. Then remove the duplicate expect_array_eq definition from
string_search_test.cc and replace it with an `#include` "test_helpers.H" so the
test uses the shared expect_array_eq helper.

In `@tpl_sort_utils.H`:
- Around line 4381-4397: In counting_sort_indices() the expressions
static_cast<size_t>(key - min_key) can trigger signed int overflow before the
cast; compute the difference in a widened signed or unsigned type first (e.g.,
long or ptrdiff_t) then cast to size_t, and apply this to both uses (the
increment ++count[...] loop and the idx computation in the final loop where
tmp[--count[idx]] = sa[...] is used). Keep the existing ah_out_of_range_error_if
check, but replace key - min_key with a safe widened subtraction (e.g., auto
diff = static_cast<long>(key) - static_cast<long>(min_key); size_t idx =
static_cast<size_t>(diff);) to avoid UB and ensure correct bucket indexing.
- Around line 4795-4967: Several public overloads of counting_sort and
radix_sort (e.g., counting_sort(DynArray<T>&), counting_sort(Array<T>&),
counting_sort(DynList<T>&), counting_sort(DynDlist<T>&), counting_sort(T* a,
size_t n), counting_sort(T (&a)[N]) and the analogous radix_sort overloads) lack
required Doxygen tags; update each public template docblock to include `@tparam`
T, `@param` descriptions (e.g., "a" or "list" or "n" for array length), and for
pointer overloads add `@exception` or `@throws` runtime_error when a is null and
n>0, and keep the existing `@par` Complexity/@brief text — ensure the tag names
match the rest of the file’s style and apply these same additions to all listed
overloads so Doxygen warnings are resolved.

---

Duplicate comments:
In `@README.es.md`:
- Around line 505-513: The review indicates the Spanish README update already
addresses previous suggestions and that the terminology and punctuation are
acceptable; no code changes are necessary—keep the section headings and items as
written (String_Search.H, Aho_Corasick.H, Suffix_Structures.H,
String_Palindromes.H, String_DP.H, String_Algorithms.H) and proceed to
merge/approve the PR.

In `@tpl_sort_utils.H`:
- Around line 4358-4364: Constrain the KeyFn template using C++20 concepts: add
the necessary includes (<concepts> and <type_traits> or <utility>) and change
the template declaration for counting_sort_indices to require that KeyFn is
invocable with a size_t and that its result is convertible to int (e.g., use
requires std::invocable<KeyFn, size_t> &&
std::convertible_to<std::invoke_result_t<KeyFn, size_t>, int>). This keeps the
function signature and body (counting_sort_indices, parameters
sa/tmp/n/min_key/max_key/key_of) intact but enforces at compile time the
expected callable shape and return type.

---

Nitpick comments:
In `@Aho_Corasick.H`:
- Around line 80-92: The Node struct currently uses a C-style int next[256] and
manual loop in Node() to initialize it; change next to a std::array<int, 256>
(keep the name Node::next) and replace the manual initialization loop with
std::fill or std::array's fill method in the constructor (or initialize inline
with {-1} if preferred) so the fixed-size buffer uses std::array and is zero/
-1-initialized per guidelines; update any access to next[i] to continue working
unchanged.
- Around line 255-258: contains_any currently calls search(text).is_empty(),
which forces a full search; change it to iterate the search results and return
true on the first match to enable early exit. Replace the body of contains_any
to obtain the search result (via search(text)) and loop over its matches (using
the result's iterator/next/for_each style API) returning true immediately when a
match is seen and false if the iterator finishes; reference contains_any and
search so you locate and modify the existing implementation.

In `@Suffix_Structures.H`:
- Around line 268-283: The choose_terminal function currently skips byte value 0
and throws if 1..255 are present; either allow '\0' by including byte 0 in the
search or explicitly document/guard the decision. To fix, update choose_terminal
(the function choose_terminal in Naive_Suffix_Tree / Suffix_Structures.H) to
consider used[0] as a valid terminal (start the for-loop at i = 0) when '\0' is
acceptable, or if '\0' must be excluded keep the current loop but add a clear
comment and/or an assertion explaining why '\0' is forbidden and how callers
must sanitize inputs; adjust error message accordingly to reflect whether only
1..255 were considered or all 256 bytes.

In `@Tests/aho_corasick_test.cc`:
- Around line 127-187: Add a deterministic stress test (e.g., TEST(AhoCorasick,
DeterministicStress)) in Tests/aho_corasick_test.cc that programmatically
generates a few thousand patterns (use a fixed algorithm/seed to produce unique
strings via Aho_Corasick::add_pattern), calls Aho_Corasick::build(), constructs
a few-hundred-KB deterministic text that contains known inserted patterns at
fixed offsets, runs Aho_Corasick::search(), and asserts correctness by checking
match counts and verifying each match via
Aho_Corasick::pattern(match.pattern_id) and text.substr(match.position,
pat.size()) == pat; avoid any wall-clock timing asserts so this serves as a
deterministic performance-regression guard.

In `@Tests/string_search_test.cc`:
- Around line 45-53: The test defines a local duplicate helper
expect_array_eq(const Array<size_t>&, std::initializer_list<size_t>) that is
identical to the shared one in Tests/test_helpers.H; remove the local
anonymous-namespace definition of expect_array_eq from
Tests/string_search_test.cc and instead include or import the shared helper by
adding the appropriate include for test_helpers.H (or using the existing test
helper header) so the test uses the single canonical expect_array_eq
implementation.

In `@tpl_sort_utils.H`:
- Around line 4642-4685: The two unconstrained template overloads
radix_sort_impl(C<IntT>&) and radix_sort_impl(IntT*, size_t) must be constrained
with requires radix_integer<IntT> to match the other overloads (the
DynList/DynDlist variants) and provide clearer compile-time errors; update the
template declarations for the generic container overload (template <template
<typename> class C, typename IntT>) and the raw-pointer overload (template
<typename IntT>) to add requires radix_integer<IntT>.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aec96ee and 470d894.

📒 Files selected for processing (14)

Aho_Corasick.H
Examples/rabin_karp_example.cc
Examples/z_algorithm_example.cc
README.es.md
Suffix_Structures.H
Tests/ah-string-utils.cc
Tests/aho_corasick_test.cc
Tests/sort_utils.cc
Tests/string_palindromes_test.cc
Tests/string_search_test.cc
Tests/suffix_structures_test.cc
Tests/test_helpers.H
ah-string-utils.H
tpl_sort_utils.H

🚧 Files skipped from review as they are similar to previous changes (2)

Examples/z_algorithm_example.cc
Tests/string_palindromes_test.cc

ah-string-utils.H

Suffix_Structures.H

Tests/aho_corasick_test.cc

Tests/sort_utils.cc

Tests/test_helpers.H

tpl_sort_utils.H

Refactored sort_utils.cc with modern C++ features and expanded test coverage for various sorting algorithms, including introsort, counting sort, and radix sort. Also includes a fix for Suffix Automaton cloning link invariant.

@lrleon

Docstrings generation was requested by @lrleon. * #37 (comment) The following files were modified: * `Examples/damerau_levenshtein_example.cc` * `Examples/edit_distance_example.cc` * `Examples/horspool_example.cc` * `Examples/kmp_example.cc` * `Examples/lcs_longest_common_substring_example.cc` * `Examples/manacher_example.cc` * `Examples/rabin_karp_example.cc` * `Examples/suffix_array_lcp_example.cc` * `Examples/suffix_automaton_example.cc` * `Examples/suffix_tree_example.cc` * `Examples/z_algorithm_example.cc` * `Tests/ah-string-utils.cc` * `Tests/sort_utils.cc` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.

tpl_sort_utils.H

Suffix_Structures.H

Tests/sort_utils.cc

Copilot

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

Tests/suffix_structures_test.cc (1)
197-216: Nice to have: add a deterministic performance regression check for critical string primitives.

StressLongerSuffixArray validates correctness at scale, but it does not guard runtime regressions over time.

Based on learnings, "Applies to Tests/**/*_test.cc : Include performance regression tests for critical algorithms".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/suffix_structures_test.cc` around lines 197 - 216, Augment the existing
StressLongerSuffixArray test to include a deterministic performance regression
check: before calling suffix_array(text) record a high-resolution timestamp
(std::chrono::steady_clock), call suffix_array(text), record the end time and
compute elapsed milliseconds, then assert the elapsed time is below a chosen
threshold with an EXPECT_LE/ASSERT_LT so the test fails on regressions; keep the
workload deterministic (reuse the current text generation) and include a brief
comment explaining the threshold and rationale so future reviewers see why the
timing assertion is present.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Suffix_Structures.H`:
- Around line 176-178: Update the Doxygen for lcp_array_kasai() to accurately
reflect runtime checks: add a `@throws` entry stating that a domain_error (or
appropriate exception) is thrown when sa contains duplicate indices (in addition
to the existing sa[i] >= text.size() case), and ensure the function's docblock
includes `@brief`, `@param`, `@return` and the expanded `@throws` tags so the
documentation matches the behavior performed in the function (lcp_array_kasai).

In `@Tests/suffix_structures_test.cc`:
- Around line 60-73: Add a unit test to verify lcp_array_kasai rejects suffix
array entries that are out-of-range: create a test (e.g., TEST(SuffixStructures,
LcpKasaiRejectsOutOfRange)) that builds an SA containing at least one index >=
text.size() (for example for "abc" use {0,1,3}) and assert that calling
lcp_array_kasai("abc", sa) throws std::out_of_range; ensure you reference
lcp_array_kasai and the SA construction (Array<size_t> / build_array<size_t>)
consistent with the other tests.

---

Nitpick comments:
In `@Tests/suffix_structures_test.cc`:
- Around line 197-216: Augment the existing StressLongerSuffixArray test to
include a deterministic performance regression check: before calling
suffix_array(text) record a high-resolution timestamp
(std::chrono::steady_clock), call suffix_array(text), record the end time and
compute elapsed milliseconds, then assert the elapsed time is below a chosen
threshold with an EXPECT_LE/ASSERT_LT so the test fails on regressions; keep the
workload deterministic (reuse the current text generation) and include a brief
comment explaining the threshold and rationale so future reviewers see why the
timing assertion is present.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96b74a7 and 6fb5157.

📒 Files selected for processing (2)

Suffix_Structures.H
Tests/suffix_structures_test.cc

Suffix_Structures.H

Tests/suffix_structures_test.cc

Rethink exception handling for lcp_array_kasai, adding a check for duplicate indices in the suffix array. Add new test cases to cover out-of-range and duplicate index errors in lcp_array_kasai.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

Tests/suffix_structures_test.cc (2)
143-150: Nice to have: extract duplicated suffix-order assertion into a helper.

The lexicographic-order verification loop appears twice; factoring it into one helper will reduce drift and simplify future updates.

As per coding guidelines, "Refactor duplicated code into reusable functions or templates".

Also applies to: 225-232
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/suffix_structures_test.cc` around lines 143 - 150, Extract the
duplicated lexicographic-order check into a single helper (e.g.,
assert_suffixes_sorted or template assert_suffixes_sorted_view) that takes the
text and the suffix array (sa) and performs the loop comparing adjacent suffixes
using std::string_view and EXPECT_LT; replace both occurrences (the loop using
prev_suffix/curr_suffix around EXPECT_LT) with calls to that helper to remove
duplication and centralize behavior.
48-271: Should fix: add seeded randomized/property cross-checks for the core string structures.

For this complexity level, add deterministic randomized tests (fixed seed) that cross-check:

suffix_array ordering against naive suffix sorting,

lcp_array_kasai against naive LCP computation,

Suffix_Automaton::contains against std::string::find on sampled substrings.

Based on learnings: Applies to Tests/**/*_test.cc : Use property-based testing or randomized tests for complex algorithms.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Tests/suffix_structures_test.cc` around lines 48 - 271, Add deterministic
randomized cross-check tests (use a fixed seed) that validate suffix_array
ordering against a naive sorted-suffix implementation, lcp_array_kasai against a
naive LCP computation, and Suffix_Automaton::contains against std::string::find
on sampled substrings; implement three new TESTs (e.g., RandomizedSuffixArray,
RandomizedLcpKasai, RandomizedSuffixAutomatonContains) which generate random
texts (bounded length), compute the library result via
suffix_array/lcp_array_kasai/Suffix_Automaton::build+contains, compute the
expected result via straightforward naive routines (sort all suffixes or compute
LCP by direct comparisons, and check substring presence with std::string::find),
assert equality for each sample, and ensure the RNG uses a fixed seed so tests
are deterministic and repeatable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Tests/suffix_structures_test.cc`:
- Around line 39-41: The test contains a hard wall-clock assertion that enforces
duration < 500ms (the timing check around the duration measurement in
Tests/suffix_structures_test.cc); replace this flaky default-path check by
gating it behind an opt-in mechanism: read an env var (e.g. ENABLE_SLOW_TESTS or
RUN_PERF_TESTS) and only perform the <500ms assertion when that var is set,
otherwise skip the timing assertion (or convert it to a non-failing log). Update
the code around the duration measurement and the assertion (the test case that
measures duration and currently asserts < 500ms) to consult the env flag before
asserting, or move the check into a separate benchmark-only target.

---

Nitpick comments:
In `@Tests/suffix_structures_test.cc`:
- Around line 143-150: Extract the duplicated lexicographic-order check into a
single helper (e.g., assert_suffixes_sorted or template
assert_suffixes_sorted_view) that takes the text and the suffix array (sa) and
performs the loop comparing adjacent suffixes using std::string_view and
EXPECT_LT; replace both occurrences (the loop using prev_suffix/curr_suffix
around EXPECT_LT) with calls to that helper to remove duplication and centralize
behavior.
- Around line 48-271: Add deterministic randomized cross-check tests (use a
fixed seed) that validate suffix_array ordering against a naive sorted-suffix
implementation, lcp_array_kasai against a naive LCP computation, and
Suffix_Automaton::contains against std::string::find on sampled substrings;
implement three new TESTs (e.g., RandomizedSuffixArray, RandomizedLcpKasai,
RandomizedSuffixAutomatonContains) which generate random texts (bounded length),
compute the library result via
suffix_array/lcp_array_kasai/Suffix_Automaton::build+contains, compute the
expected result via straightforward naive routines (sort all suffixes or compute
LCP by direct comparisons, and check substring presence with std::string::find),
assert equality for each sample, and ensure the RNG uses a fixed seed so tests
are deterministic and repeatable.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb5157 and da400b8.

📒 Files selected for processing (2)

Suffix_Structures.H
Tests/suffix_structures_test.cc

Tests/suffix_structures_test.cc

Refactor sort utility error handling and update documentation for Aho-Corasick and String_Palindromes.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tpl_sort_utils.H`:
- Around line 5063-5074: The Doxygen `@throws` entry for the radix_sort(T* a,
size_t n) pointer overload is incorrect: the code uses ah_invalid_argument_if
(which throws std::invalid_argument) but the comment lists std::runtime_error;
update the documentation to match the implementation by changing the `@throws`
line to std::invalid_argument for the null-pointer check in the radix_sort
template (referencing the radix_sort function and its ah_invalid_argument_if
call).
- Around line 4943-4955: The Doxygen `@throws` is incorrect: counting_sort uses
ah_invalid_argument_if (which throws std::invalid_argument) for the null-pointer
check; update the documentation for counting_sort to list std::invalid_argument
(not std::runtime_error) for the "a is null and n > 0" case, referencing the
counting_sort template and the ah_invalid_argument_if check so readers and docs
match the implementation.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da400b8 and 4674d2f.

📒 Files selected for processing (5)

Aho_Corasick.H
String_Palindromes.H
Tests/sort_utils.cc
Tests/suffix_structures_test.cc
tpl_sort_utils.H

🚧 Files skipped from review as they are similar to previous changes (1)

Tests/suffix_structures_test.cc

tpl_sort_utils.H

Copilot

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

tpl_avl.H:1

The do { ... } while (p != Node::NullPtr); structure executes the body once even when p == Node::NullPtr (e.g., empty tree). With the new unconditional __builtin_prefetch(LLINK(p)) / RLINK(p), this can dereference a null/sentinel pointer via the LLINK/RLINK macro expansion and trigger UB. Restructure the loop to only prefetch after verifying p != Node::NullPtr (e.g., use a while (p != Node::NullPtr) loop), or guard the prefetch calls so LLINK/RLINK are never evaluated for a null node.
tpl_avlRk.H:1
Same issue as in tpl_avl.H: this do { ... } while (p != Node::NullPtr); executes once even if p starts as Node::NullPtr, and the new unconditional prefetch evaluates LLINK(p) / RLINK(p) before any null check, which can be UB for an empty tree. Convert to a while (p != Node::NullPtr) descent (or guard prefetch) so child-link expressions are never evaluated when p is null.
Tests/suffix_structures_test.cc:1
This test file later uses std::getenv and std::cout (perf-gated logging), but the corresponding standard headers (<cstdlib> and <iostream>) are not included here. Relying on transitive includes can be toolchain-dependent; add explicit includes for the standard facilities used in this TU.
ah-dry.H:1
The inline code formatting is broken on this line (missing closing backtick after insert()), and the sentence reads ungrammatically ('If all on the container...'). Please fix the backticks and rephrase the sentence for clarity (e.g., 'insert() of the container. If all of the container and T...').

Suffix_Structures.H

README.md

coderabbitai

🧹 Nitpick comments (1)

Suffix_Structures.H (1)

711-718: Pre-reserve SAM state capacity in build() to avoid repeated reallocations.

At Line 711, extend() is called per character and can trigger repeated growth/copies. Reserving up front makes the linear build path tighter on large inputs.

♻️ Proposed change

 void build(const std::string_view text)
 {
   clear();
+  states_.reserve(2 * text.size() + 1);
   for (const char c: text)
     extend(c);

   mark_terminals();
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Suffix_Structures.H` around lines 711 - 718, The build(const std::string_view
text) path should pre-reserve SAM state capacity to avoid repeated
reallocations: after clear() call states.reserve(2 * text.size()) (and similarly
reserve any per-state transition container capacity if available) before the
for-loop that calls extend(c); update references to the container holding SAM
states (e.g., the vector named states or the class SuffixAutomaton's states
member) so extend() can push without reallocation during linear construction.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Suffix_Structures.H`:
- Around line 711-718: The build(const std::string_view text) path should
pre-reserve SAM state capacity to avoid repeated reallocations: after clear()
call states.reserve(2 * text.size()) (and similarly reserve any per-state
transition container capacity if available) before the for-loop that calls
extend(c); update references to the container holding SAM states (e.g., the
vector named states or the class SuffixAutomaton's states member) so extend()
can push without reallocation during linear construction.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4674d2f and 1bbe644.

📒 Files selected for processing (3)

README.md
Suffix_Structures.H
tpl_sort_utils.H

Copilot

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated no new comments.

lrleon added 12 commits January 28, 2026 09:44

refactor(doxyfile): Refactor Doxyfile to standardize settings

dd0f1eb

Merge branch 'master' of github.com:lrleon/Aleph-w

4a4e122

Merge branch 'master' of github.com:lrleon/Aleph-w

8863f79

Merge branch 'master' of github.com:lrleon/Aleph-w

b3b795d

Merge branch 'master' of github.com:lrleon/Aleph-w

9e162e8

Merge branches 'master' and 'master' of github.com:lrleon/Aleph-w

95914a2

removed not needed loop

a0b3061

Merge branch 'master' of github.com:lrleon/Aleph-w

18097d2

Merge branch 'master' of github.com:lrleon/Aleph-w

fe7f669

Merge branch 'master' of github.com:lrleon/Aleph-w

4d0615d

Merge branch 'master' of github.com:lrleon/Aleph-w

82f7f17

Add string algorithms and examples, including edit distance and palin…

aec96ee

…drome tests

Copilot AI review requested due to automatic review settings February 24, 2026 03:18

Copilot AI reviewed Feb 24, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 24, 2026

View reviewed changes

Aho_Corasick.H Outdated Show resolved Hide resolved

Suffix_Structures.H Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of lrleon February 24, 2026 03:27 View session

coderabbitai bot requested changes Feb 24, 2026

View reviewed changes

Aho_Corasick.H Outdated Show resolved Hide resolved

README.es.md Outdated Show resolved Hide resolved

Suffix_Structures.H Show resolved Hide resolved

tpl_sort_utils.H Outdated Show resolved Hide resolved

tpl_sort_utils.H Outdated Show resolved Hide resolved

lrleon changed the title ~~Strings~~ Add comprehensive string algorithms (KMP, Z-algorithm, Aho-Corasick, suffix structures, Manacher, edit distance) Feb 24, 2026

coderabbitai bot mentioned this pull request Feb 24, 2026

📝 Add docstrings to strings #38

Merged

coderabbitai bot requested changes Feb 24, 2026

View reviewed changes

lrleon and others added 2 commits February 24, 2026 12:12

test(sort): Refactor and update sort_utils tests

5e8f45c

Refactored sort_utils.cc with modern C++ features and expanded test coverage for various sorting algorithms, including introsort, counting sort, and radix sort. Also includes a fix for Suffix Automaton cloning link invariant.

Copilot AI review requested due to automatic review settings February 24, 2026 17:13

Merge branch 'strings' of github.com:lrleon/Aleph-w into strings

2abedd9

Copilot started reviewing on behalf of lrleon February 24, 2026 17:14 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

tpl_sort_utils.H Outdated Show resolved Hide resolved

Suffix_Structures.H Outdated Show resolved Hide resolved

Tests/sort_utils.cc Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings February 25, 2026 17:06

lrleon dismissed coderabbitai[bot]’s stale review via 6fb5157 February 25, 2026 17:06

Copilot started reviewing on behalf of lrleon February 25, 2026 17:06 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

coderabbitai bot requested changes Feb 25, 2026

View reviewed changes

Suffix_Structures.H Outdated Show resolved Hide resolved

Tests/suffix_structures_test.cc Show resolved Hide resolved

coderabbitai bot previously approved these changes Feb 25, 2026

View reviewed changes

refactor(lcp): Improve LCP array and test error handling

da400b8

Rethink exception handling for lcp_array_kasai, adding a check for duplicate indices in the suffix array. Add new test cases to cover out-of-range and duplicate index errors in lcp_array_kasai.

lrleon dismissed coderabbitai[bot]’s stale review via da400b8 February 25, 2026 17:28

coderabbitai bot requested changes Feb 25, 2026

View reviewed changes

Tests/suffix_structures_test.cc Show resolved Hide resolved

coderabbitai bot previously approved these changes Feb 26, 2026

View reviewed changes

refactor(core): Improve error handling and docs

4674d2f

Refactor sort utility error handling and update documentation for Aho-Corasick and String_Palindromes.

Copilot AI review requested due to automatic review settings February 26, 2026 01:17

lrleon dismissed coderabbitai[bot]’s stale review via 4674d2f February 26, 2026 01:17

coderabbitai bot requested changes Feb 26, 2026

View reviewed changes

tpl_sort_utils.H Outdated Show resolved Hide resolved

tpl_sort_utils.H Show resolved Hide resolved

Copilot started reviewing on behalf of lrleon February 26, 2026 01:24 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Suffix_Structures.H Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

coderabbitai bot previously approved these changes Feb 26, 2026

View reviewed changes

refactor(docs, core): Update README and exception messages

09f8f67

lrleon dismissed coderabbitai[bot]’s stale review via 09f8f67 February 26, 2026 01:39

coderabbitai bot previously approved these changes Feb 26, 2026

View reviewed changes

refactor(docs, core): Update README and exception messages

1bbe644

Copilot AI review requested due to automatic review settings February 26, 2026 01:51

lrleon dismissed coderabbitai[bot]’s stale review via 1bbe644 February 26, 2026 01:51

Copilot started reviewing on behalf of lrleon February 26, 2026 01:54 View session

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

coderabbitai bot approved these changes Feb 26, 2026

View reviewed changes

Copilot AI reviewed Feb 26, 2026

View reviewed changes

lrleon merged commit f364851 into master Feb 26, 2026
35 checks passed

coderabbitai bot mentioned this pull request Feb 26, 2026

Add comprehensive dynamic programming toolkit with knapsack, LIS, matrix chain, subset sum, tree DP, and optimizations #39

Merged

coderabbitai bot mentioned this pull request Apr 13, 2026

Add compiler front-end support infrastructure #54

Open

Conversation

lrleon commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

lrleon commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading