Releases: coregx/coregex
v0.12.0: Rust-inspired optimizations
Performance
- Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
- Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
- Prefilter
IsFast()gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists - DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA
Fixed
- OnePass DFA capture limit — tighten from 17 to 16 capture groups (
uint32slot mask = 32 bits)
Benchmark (AMD EPYC, regex-bench)
| Pattern | coregex | vs stdlib | vs Rust |
|---|---|---|---|
| suffix | 0.91ms | 257x | 1.4x faster |
| 0.70ms | 383x | 1.9x faster | |
| ip | 2.19ms | 225x | 5.5x faster |
| uri | 0.76ms | 340x | 1.2x faster |
| multiline_php | 0.60ms | 171x | 1.2x faster |
| anchored_php | 0.03ms | ~1x | 12.0x faster |
v0.11.9: Fix missing first-byte prefilter in FindAll
Fixed
- Missing first-byte prefilter in FindAll state-reusing path (#107)
findIndicesBoundedBacktrackerAtWithStatewas missinganchoredFirstBytesO(1) check- Pattern
^/.*[\w-]+\.php(without$) took 377ms instead of 40µs on 6MB input - Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)
Full Changelog
v0.11.8: Fix UseAnchoredLiteral regression
Fixed
- Critical regression in UseAnchoredLiteral strategy (#107)
FindIndices*andfindIndicesAtWithStatewere missingUseAnchoredLiteralcase- Pattern
^/.*[\w-]+\.php$fell through to slow NFA path - Regression: 0.01ms → 408ms (40,000x slower)
- Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)
Full Changelog
v0.11.7: FindAll optimization - 1.08x faster than stdlib
Fixed
FindAll now uses optimized state-reusing path
- FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
- Results for
(\w{2,8})+on 6MB: 2179ms → 834ms (2.6x faster) - Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)
Full Changelog
See CHANGELOG.md
v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib
Performance
Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).
Key Changes
- Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
- SlotTable architecture: Rust-style per-state slot storage
- Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
- Lightweight searchThread: 16 bytes (was 40+ bytes)
Benchmark Results
Pattern (\w{2,8})+ vs stdlib:
| Size | Speedup |
|---|---|
| 10KB | 1.68x faster |
| 50KB | 1.88x faster |
| 100KB | 2.04x faster |
| 1MB | 1.67x faster |
| 6MB | 1.68x faster |
6MB improvement: 1900ms → 628ms (3x faster)
Full Changelog
See CHANGELOG.md
v0.11.5: Fix checkHasWordBoundary catastrophic slowdown
Summary
Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).
Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)
Changes
Fixed
- checkHasWordBoundary catastrophic slowdown (Issue #105)
- Root cause: O(N*M) complexity from scanning all NFA states per byte
- Fix: Use
NewBuilderWithWordBoundary(), addhasWordBoundaryguards, anchored prefilter verification
Performance
- DFA state lookup: map → slice — 42% CPU time eliminated
- Literal extraction from capture/repeat groups — better prefilters
=($\w...){2}now extracts=$(2 bytes) instead of just=
Benchmarks (79KB input)
| Stage | Time | vs stdlib |
|---|---|---|
| Before fix | 3m 22s | 7,000,000x slower |
| After fix | 3.6 µs | 8.6x faster |
Credits
@danslo for root cause analysis and fix suggestions
Full Changelog: v0.11.4...v0.11.5
v0.11.4: FindAll multiline optimization
Fixed
- FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
FindIndicesAt()was missing dispatch forUseMultilineReverseSuffixIsMatch/Findwere fast (1µs), butFindAllwas slow (78ms) — 100x gap vs Rust- After fix:
FindAllon 6MB with 2000 matches: ~1ms (was 78ms)
Performance
| Operation | Before | After | Improvement |
|---|---|---|---|
| FindAll (6MB, 2000 matches) | 78ms | ~1ms | 78x faster |
| vs Rust gap | 100x slower | ~1.3x slower | Near parity! |
Changed
- Updated
golang.org/x/sysv0.39.0 → v0.40.0
Full Changelog: v0.11.3...v0.11.4
v0.11.3: Prefix fast path 319-552x speedup
Performance
Pattern (?m)^/.*\.php now 319-552x faster than stdlib (was 3.5-5.7x in v0.11.1)
| Operation | coregex | stdlib | Speedup |
|---|---|---|---|
| IsMatch | 182 ns | 100 µs | 552x |
| Find | 240 ns | 81 µs | 338x |
| CountAll | 58 µs | 18.7 ms | 319x |
Algorithm
- Suffix prefilter finds
.phpcandidates (SIMD memmem) - SIMD backward scan to find line start (
bytes.LastIndexByte) - O(1) prefix byte check (
/at line start) - Skip-to-next-line on mismatch (avoids O(n²) worst case)
- DFA fallback for complex patterns without extractable prefix
Changes
MultilineReverseSuffixSearcher.prefixBytesfor O(1) verificationSetPrefixLiterals()extracts prefix from patternfindLineStart()uses SIMDbytes.LastIndexByte- Skip-to-next-line: on prefix mismatch, jump to next
\nposition
Fixes #99
v0.11.2: DFA verification for UseMultilineReverseSuffix
Performance Improvement
Replace O(n*m) PikeVM verification with O(n) DFA verification for multiline suffix patterns.
Issue: #99 (Rust regex 84x faster on (?m)^/.*\.php)
Benchmark Results
| Case | Before | After | Speedup |
|---|---|---|---|
| No-match (2KB) | 1136 ns | 108 ns | 10.5x |
| Long no-match | 25937 ns | 197 ns | 131x |
| Large input (6MB) | 66 ms | ~5-10 ms | 10-30x (expected) |
Changes
MultilineReverseSuffixSearcher.forwardDFAreplacespikevmfield- Uses
lazy.DFA.SearchAtAnchored()for linear-time anchored matching lazy.CompileWithConfig()creates forward DFA with proper config
Research Insight
Analysis of Rust regex-automata revealed that the hybrid (lazy) DFA does NOT use per-state acceleration — only the dense (pre-compiled) DFA does. The real performance difference comes from using DFA vs NFA/PikeVM for verification.
coregex already has partial state acceleration in dfa/lazy/. The main fix was switching from PikeVM to DFA verification.
Full Changelog: v0.11.1...v0.11.2
v0.11.1: UseMultilineReverseSuffix 3.5-5.7x speedup
What's New
New 18th strategy UseMultilineReverseSuffix for multiline suffix patterns like (?m)^/.*\.php.
Performance (Issue #97)
Before: coregex was 24% slower than stdlib
After: coregex is 3.5-5.7x faster than stdlib
| Operation | coregex | stdlib | Speedup |
|---|---|---|---|
| IsMatch (0.5MB) | 20.6 µs | 72.2 µs | 3.5x |
| Find (0.5MB) | 15.3 µs | 68.7 µs | 4.5x |
| CountAll (200 matches) | 2.56 ms | 14.6 ms | 5.7x |
| No-match (small) | 90 ns | 1.1 µs | 12x |
| No-match (2KB) | 184 ns | 24 µs | 130x |
Algorithm
- Suffix prefilter finds
.phpcandidates - Backward scan to line start (
\nor pos 0) - Forward PikeVM verification
Files
meta/reverse_suffix_multiline.go(NEW)meta/reverse_suffix_multiline_test.go(NEW)
Full Changelog: v0.11.0...v0.11.1