-
Notifications
You must be signed in to change notification settings - Fork 5.1k
NonBacktracking Regex optimizations #102655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
34eba54
Regex automata optimizations
ieviev 49607f4
off by one err
ieviev 5ac29f3
wip reversal optimizations
ieviev e440dec
removing unnecessary overhead
ieviev 627fd90
handle final position correctly
ieviev 7ae6440
edge case workarounds, tests should be ok again
ieviev 383f3e5
optimizing lookup initialization
ieviev 5a2636c
more dfa overhead removed
ieviev 57e5b8d
removed potential rewrite
ieviev 4d275db
low memory variant
ieviev c35ed7e
some kind of compromise between speed and memory
ieviev 868e02d
cheaper nullability checks
ieviev 14afd18
nullability encoding
ieviev 5f5ab55
nullability cached as bytes
ieviev dd121de
reverting some changes
ieviev 723c5b6
testing nfa fallback
ieviev 6bf4095
refactoring, work in progress
ieviev b10e600
refactoring to struct interfaces
ieviev d68bd3c
refactoring optimizations
ieviev 153dfc3
fallback mode and bugfix
ieviev 4aebe3e
reenable warnings
ieviev 1e6f55c
anchor edge case
ieviev c6ad3ac
anchor edge cases
ieviev e10b43f
Apply suggestions from code review
ieviev f581755
Apply suggestions from code review
ieviev 01a9684
rebased branch and some cleanup
ieviev 341ce27
cleanup, removing unused features
ieviev 1a28c69
cleanup
ieviev 9bba84f
timeout limit changes
ieviev a957781
lookup allocation threshold and timeout limits
ieviev 7e86855
char mapping
ieviev 99b5717
empty array mapping
ieviev 47c6b04
adding timeout check to create-derivative
ieviev 22d23fa
some cleanup
ieviev 761f897
comments and cleanup
ieviev 53924eb
cleanup and comments
ieviev e66d3d3
reflecting new limits in tests
ieviev 65c0b8b
rerunning tests
ieviev de085b4
retesting DFA timeout
ieviev 5ef3b32
more precise regex memory limit for DFA mode
ieviev 281446f
reverting change
ieviev 8f78046
reverting reversal refactor
ieviev 7157520
Apply suggestions from code review
ieviev 931552d
variable naming
ieviev cc493f1
test for over 255 minterms
ieviev a0d2390
adding net directive around test
ieviev 0691c58
all engines in minterms test
ieviev 8ceb207
Apply suggestions from code review
ieviev 379519b
Apply suggestions from code review
ieviev 57c8f6d
simplifying code
ieviev 2e57d42
state flag values down
ieviev 60b1352
mintermclassifier changes
ieviev 2900aad
reversal
ieviev 764ded8
getstateflags
ieviev 81d0dca
formatting
ieviev 38f28b9
removing unused interface
ieviev cce1188
local function typo
ieviev 8b946da
temporarily removing minterms test
ieviev d3430b3
re-adding minterms test
ieviev 388c256
reenabling test for all engines
ieviev 2704641
test bugfix
ieviev 0abaabe
expected matches change
ieviev 0a0f409
Review and clean up some code
stephentoub File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
39 changes: 39 additions & 0 deletions
39
...stem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MatchReversal.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
|
||
using System.Diagnostics; | ||
|
||
namespace System.Text.RegularExpressions.Symbolic | ||
{ | ||
/// <summary>Provides details on how a match may be processed in reverse to find the beginning of a match once a match's existence has been confirmed.</summary> | ||
internal readonly struct MatchReversalInfo<TSet> where TSet : IComparable<TSet>, IEquatable<TSet> | ||
{ | ||
/// <summary>Initializes the match reversal details.</summary> | ||
internal MatchReversalInfo(MatchReversalKind kind, int fixedLength, MatchingState<TSet>? adjustedStartState = null) | ||
{ | ||
Debug.Assert(kind is MatchReversalKind.MatchStart or MatchReversalKind.FixedLength or MatchReversalKind.PartialFixedLength); | ||
Debug.Assert(fixedLength >= 0); | ||
Debug.Assert((adjustedStartState is not null) == (kind is MatchReversalKind.PartialFixedLength)); | ||
|
||
Kind = kind; | ||
FixedLength = fixedLength; | ||
AdjustedStartState = adjustedStartState; | ||
} | ||
|
||
/// <summary>Gets the kind of the match reversal processing required.</summary> | ||
internal MatchReversalKind Kind { get; } | ||
|
||
/// <summary>Gets the fixed length of the match, if one is known.</summary> | ||
/// <remarks> | ||
/// For <see cref="MatchReversalKind.MatchStart"/>, this is ignored. | ||
/// For <see cref="MatchReversalKind.FixedLength"/>, this is the full length of the match. The beginning may be found simply | ||
/// by subtracting this length from the end. | ||
/// For <see cref="MatchReversalKind.PartialFixedLength"/>, this is the length of fixed portion of the match. | ||
/// </remarks> | ||
internal int FixedLength { get; } | ||
|
||
/// <summary>Gets the adjusted start state to use for partial fixed-length matches.</summary> | ||
/// <remarks>This will be non-null iff <see cref="Kind"/> is <see cref="MatchReversalKind.PartialFixedLength"/>.</remarks> | ||
internal MatchingState<TSet>? AdjustedStartState { get; } | ||
} | ||
} |
26 changes: 26 additions & 0 deletions
26
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MatchReversalKind.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
|
||
namespace System.Text.RegularExpressions.Symbolic | ||
{ | ||
/// <summary>Specifies the kind of a <see cref="MatchReversalInfo{TSet}"/>.</summary> | ||
internal enum MatchReversalKind | ||
{ | ||
/// <summary>The regex should be run in reverse to find beginning of the match.</summary> | ||
MatchStart, | ||
|
||
/// <summary>The end of the pattern is of a fixed length and can be skipped as part of running a regex in reverse to find the beginning of the match.</summary> | ||
/// <remarks> | ||
/// Reverse execution is not necessary for a subset of the match. | ||
/// <see cref="MatchReversalInfo{TSet}.FixedLength"/> will contain the length of the fixed portion. | ||
/// </remarks> | ||
PartialFixedLength, | ||
|
||
/// <summary>The entire pattern is of a fixed length.</summary> | ||
/// <remarks> | ||
/// Reverse execution is not necessary to find the beginning of the match. | ||
/// <see cref="MatchReversalInfo{TSet}.FixedLength"/> will contain the length of the match. | ||
/// </remarks> | ||
FixedLength | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.