Skip to content

Lower regex MaxUnrollSize from 16 to 7#126092

Merged
stephentoub merged 1 commit into
dotnet:mainfrom
stephentoub:regex-lower-maxunrollsize
Mar 26, 2026
Merged

Lower regex MaxUnrollSize from 16 to 7#126092
stephentoub merged 1 commit into
dotnet:mainfrom
stephentoub:regex-lower-maxunrollsize

Conversation

@stephentoub
Copy link
Copy Markdown
Member

For fixed-count single-character repeaters (e.g. \d{N}), the regex source generator and compiler choose between unrolling individual character checks vs using vectorized operations like ContainsAnyExcept. MaxUnrollSize was the threshold controlling this decision — previously set to 16.

Benchmarking across multiple character class types shows the crossover point where vectorization wins is consistently between count 4 and 8:

Count Range (\d) SingleCharNeg ([^x]) SmallSet ([abc]) SearchValues ([a-zA-Z])
2 Loop 1.28x faster Loop 1.11x Loop 2.06x Loop 1.72x
4 Loop 1.31x faster ~tied Loop 1.67x Loop ~1.5x
8 Vec 0.75x Vec 0.52x Vec 0.86x Vec 0.75x
16 Vec 0.42x Vec 0.35x Vec 0.53x Vec 0.43x

At count 8+, vectorized operations win across all character class types. At count ≤4, the unrolled loop wins due to lower overhead and early-exit on mismatch.

This PR lowers the threshold from 16 to 8 in both RegexGenerator.Emitter.cs (source generator) and RegexCompiler.cs (compiled engine), so that repeaters with counts 9–16 now use vectorized operations instead of unrolled scalar checks.

Note

This PR was generated with the assistance of GitHub Copilot.

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub requested a review from danmoseley March 25, 2026 14:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the unrolling threshold used by both the regex compiler and the regex source generator when emitting fixed-count single-character repeaters, favoring vectorized implementations sooner based on benchmarked crossover points.

Changes:

  • Lower MaxUnrollSize from 16 to 8 in the compiled regex engine (RegexCompiler).
  • Lower MaxUnrollSize from 16 to 8 in the regex source generator emitter (RegexGenerator.Emitter).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Lowers the unroll-vs-vectorize threshold for compiled regex repeater emission.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Lowers the same threshold for source-generated regex emission to keep behavior aligned.

Comment thread src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Outdated
Copilot AI review requested due to automatic review settings March 25, 2026 16:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Outdated
@danmoseley
Copy link
Copy Markdown
Member

Just curious, what would it take for IndexOf or whatever that you're calling for >= 8 to do the same fast scalar thing for all lengths, as an implementation detail? ie

 if (span.Length < 8) // .. scalar
 else { VectorizedPath(span); } // not inlined

and the JIT figure out that Slice(foo, 3) or whatever can be expanded to char by char comparison like the source gen is doing here. I guess there's a bunch of pieces, like it would have to propagate the constant through to elide the slice, figure out inlining is worthwhile, and somehow specialize it for constant argument (waving hands as this isn't my domain)

@stephentoub stephentoub force-pushed the regex-lower-maxunrollsize branch from ad17533 to 76f9012 Compare March 26, 2026 17:15
@stephentoub stephentoub changed the title Lower regex MaxUnrollSize from 16 to 8 Lower regex MaxUnrollSize from 16 to 7 Mar 26, 2026
@stephentoub stephentoub enabled auto-merge (squash) March 26, 2026 17:15
@stephentoub
Copy link
Copy Markdown
Member Author

/ba-g unrelated mono BadExits

@stephentoub stephentoub merged commit 67495bc into dotnet:main Mar 26, 2026
88 of 91 checks passed
@stephentoub stephentoub deleted the regex-lower-maxunrollsize branch March 26, 2026 21:16
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants