Skip to content

Releases: coregx/coregex

v0.12.0: Rust-inspired optimizations

06 Feb 01:15
a30fd70

Choose a tag to compare

Performance

  • Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
  • Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
  • Prefilter IsFast() gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists
  • DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA

Fixed

  • OnePass DFA capture limit — tighten from 17 to 16 capture groups (uint32 slot mask = 32 bits)

Benchmark (AMD EPYC, regex-bench)

Pattern coregex vs stdlib vs Rust
suffix 0.91ms 257x 1.4x faster
email 0.70ms 383x 1.9x faster
ip 2.19ms 225x 5.5x faster
uri 0.76ms 340x 1.2x faster
multiline_php 0.60ms 171x 1.2x faster
anchored_php 0.03ms ~1x 12.0x faster

v0.11.9: Fix missing first-byte prefilter in FindAll

01 Feb 21:33
8b528fa

Choose a tag to compare

Fixed

  • Missing first-byte prefilter in FindAll state-reusing path (#107)
    • findIndicesBoundedBacktrackerAtWithState was missing anchoredFirstBytes O(1) check
    • Pattern ^/.*[\w-]+\.php (without $) took 377ms instead of 40µs on 6MB input
    • Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)

Full Changelog

v0.11.8...v0.11.9

v0.11.8: Fix UseAnchoredLiteral regression

01 Feb 20:41
f0f527d

Choose a tag to compare

Fixed

  • Critical regression in UseAnchoredLiteral strategy (#107)
    • FindIndices* and findIndicesAtWithState were missing UseAnchoredLiteral case
    • Pattern ^/.*[\w-]+\.php$ fell through to slow NFA path
    • Regression: 0.01ms → 408ms (40,000x slower)
    • Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)

Full Changelog

v0.11.7...v0.11.8

v0.11.7: FindAll optimization - 1.08x faster than stdlib

01 Feb 19:50
1480f40

Choose a tag to compare

Fixed

FindAll now uses optimized state-reusing path

  • FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
  • Results for (\w{2,8})+ on 6MB: 2179ms → 834ms (2.6x faster)
  • Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)

Full Changelog

See CHANGELOG.md

v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib

01 Feb 18:56
fce1691

Choose a tag to compare

Performance

Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).

Key Changes

  • Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
  • SlotTable architecture: Rust-style per-state slot storage
  • Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
  • Lightweight searchThread: 16 bytes (was 40+ bytes)

Benchmark Results

Pattern (\w{2,8})+ vs stdlib:

Size Speedup
10KB 1.68x faster
50KB 1.88x faster
100KB 2.04x faster
1MB 1.67x faster
6MB 1.68x faster

6MB improvement: 1900ms → 628ms (3x faster)

Full Changelog

See CHANGELOG.md

v0.11.5: Fix checkHasWordBoundary catastrophic slowdown

01 Feb 09:46
de173be

Choose a tag to compare

Summary

Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).

Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)

Changes

Fixed

  • checkHasWordBoundary catastrophic slowdown (Issue #105)
    • Root cause: O(N*M) complexity from scanning all NFA states per byte
    • Fix: Use NewBuilderWithWordBoundary(), add hasWordBoundary guards, anchored prefilter verification

Performance

  • DFA state lookup: map → slice — 42% CPU time eliminated
  • Literal extraction from capture/repeat groups — better prefilters
    • =($\w...){2} now extracts =$ (2 bytes) instead of just =

Benchmarks (79KB input)

Stage Time vs stdlib
Before fix 3m 22s 7,000,000x slower
After fix 3.6 µs 8.6x faster

Credits

@danslo for root cause analysis and fix suggestions

Full Changelog: v0.11.4...v0.11.5

v0.11.4: FindAll multiline optimization

16 Jan 15:59
8baa0ef

Choose a tag to compare

Fixed

  • FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
    • FindIndicesAt() was missing dispatch for UseMultilineReverseSuffix
    • IsMatch/Find were fast (1µs), but FindAll was slow (78ms) — 100x gap vs Rust
    • After fix: FindAll on 6MB with 2000 matches: ~1ms (was 78ms)

Performance

Operation Before After Improvement
FindAll (6MB, 2000 matches) 78ms ~1ms 78x faster
vs Rust gap 100x slower ~1.3x slower Near parity!

Changed

  • Updated golang.org/x/sys v0.39.0 → v0.40.0

Full Changelog: v0.11.3...v0.11.4

v0.11.3: Prefix fast path 319-552x speedup

16 Jan 14:45
43efbbd

Choose a tag to compare

Performance

Pattern (?m)^/.*\.php now 319-552x faster than stdlib (was 3.5-5.7x in v0.11.1)

Operation coregex stdlib Speedup
IsMatch 182 ns 100 µs 552x
Find 240 ns 81 µs 338x
CountAll 58 µs 18.7 ms 319x

Algorithm

  1. Suffix prefilter finds .php candidates (SIMD memmem)
  2. SIMD backward scan to find line start (bytes.LastIndexByte)
  3. O(1) prefix byte check (/ at line start)
  4. Skip-to-next-line on mismatch (avoids O(n²) worst case)
  5. DFA fallback for complex patterns without extractable prefix

Changes

  • MultilineReverseSuffixSearcher.prefixBytes for O(1) verification
  • SetPrefixLiterals() extracts prefix from pattern
  • findLineStart() uses SIMD bytes.LastIndexByte
  • Skip-to-next-line: on prefix mismatch, jump to next \n position

Fixes #99

v0.11.2: DFA verification for UseMultilineReverseSuffix

16 Jan 13:46
fe0bc83

Choose a tag to compare

Performance Improvement

Replace O(n*m) PikeVM verification with O(n) DFA verification for multiline suffix patterns.

Issue: #99 (Rust regex 84x faster on (?m)^/.*\.php)

Benchmark Results

Case Before After Speedup
No-match (2KB) 1136 ns 108 ns 10.5x
Long no-match 25937 ns 197 ns 131x
Large input (6MB) 66 ms ~5-10 ms 10-30x (expected)

Changes

  • MultilineReverseSuffixSearcher.forwardDFA replaces pikevm field
  • Uses lazy.DFA.SearchAtAnchored() for linear-time anchored matching
  • lazy.CompileWithConfig() creates forward DFA with proper config

Research Insight

Analysis of Rust regex-automata revealed that the hybrid (lazy) DFA does NOT use per-state acceleration — only the dense (pre-compiled) DFA does. The real performance difference comes from using DFA vs NFA/PikeVM for verification.

coregex already has partial state acceleration in dfa/lazy/. The main fix was switching from PikeVM to DFA verification.


Full Changelog: v0.11.1...v0.11.2

v0.11.1: UseMultilineReverseSuffix 3.5-5.7x speedup

15 Jan 21:55
6552443

Choose a tag to compare

What's New

New 18th strategy UseMultilineReverseSuffix for multiline suffix patterns like (?m)^/.*\.php.

Performance (Issue #97)

Before: coregex was 24% slower than stdlib
After: coregex is 3.5-5.7x faster than stdlib

Operation coregex stdlib Speedup
IsMatch (0.5MB) 20.6 µs 72.2 µs 3.5x
Find (0.5MB) 15.3 µs 68.7 µs 4.5x
CountAll (200 matches) 2.56 ms 14.6 ms 5.7x
No-match (small) 90 ns 1.1 µs 12x
No-match (2KB) 184 ns 24 µs 130x

Algorithm

  1. Suffix prefilter finds .php candidates
  2. Backward scan to line start (\n or pos 0)
  3. Forward PikeVM verification

Files

  • meta/reverse_suffix_multiline.go (NEW)
  • meta/reverse_suffix_multiline_test.go (NEW)

Full Changelog: v0.11.0...v0.11.1