Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 17, 2025

Summary

Fixed a bug where balancing groups were incorrectly being removed from negative lookarounds during regex tree reduction, causing patterns like ()(?'-1')(?!(?'-1')) to incorrectly return 0 matches instead of matching at every position.

Root Cause

The issue was introduced by an optimization in the RemoveCaptures method within ReduceLookaround in RegexNode.cs. This method removes capture groups from negative lookarounds since captures inside negative lookarounds are undone after the lookaround completes. However, it was incorrectly removing ALL capture groups, including balancing groups.

Balancing groups (e.g., (?'-1')) have semantic meaning that affects matching behavior - they require a specific capture group to have been captured before they can succeed. Removing them changes the match semantics.

The Fix

Modified the RemoveCaptures method to preserve balancing groups by checking if N != -1 (where N stores the uncapture group number). The code now uses pattern matching for cleaner syntax: if (node is { Kind: RegexNodeKind.Capture, N: -1 }). Updated comments to clarify that captures that don't rely on or impact persisted state can be removed, which includes backreferences and balancing groups.

Changes Made

  1. RegexNode.cs: Modified RemoveCaptures to check node.N == -1 before removing captures, using pattern matching
  2. Regex.Match.Tests.cs: Added test case for balancing groups in negative lookarounds
  3. Regex.Count.Tests.cs: Added count test to verify correct match count

Test Results

  • ✅ All functional tests pass: 29,334 tests
  • ✅ All unit tests pass: 1,005 tests
  • ✅ Pattern ()(?'-1')(?!(?'-1')) now correctly matches at every position in the input string
Original prompt

This section details on the original issue you should resolve

<issue_title>Regex ()(?'-1')(?!(?'-1')) exhibit incorrect matching behavior in .NET10</issue_title>
<issue_description>### Description

Such situation occur in (?!) and (?<!) under Interpreter、Compiled and source GeneratedCode.

Reproduction Steps

using System.Text.RegularExpressions;

var pattern = @"()(?'-1')(?!(?'-1'))";
var input = "such situation occur in (?!) and (?<!) under Interpreter、Compiled and source GeneratedCode";
Regex regex = new Regex(pattern, RegexOptions.Compiled);
var mhes = regex.Matches(input);
Console.WriteLine(Regex.Matches(input, pattern).Count);
Console.WriteLine(mhes.Count);
Console.WriteLine(RegexTester.regexGenerate().Matches(input).Count);
internal partial class RegexTester
{
	[GeneratedRegex(@"()(?'-1')(?!(?'-1'))")]
	internal static partial Regex regexGenerate();
}

Here is part of source Generated Code:

partial class RegexTester
{
    /// <remarks>
    /// Pattern:<br/>
    /// <code>()(?'-1')(?!(?'-1'))</code><br/>
    /// Explanation:<br/>
    /// <code>
    /// ○ Fail to match.<br/>
    /// </code>
    /// </remarks>
    [global::System.CodeDom.Compiler.GeneratedCodeAttribute("System.Text.RegularExpressions.Generator", "10.0.13.307")]
    internal static partial global::System.Text.RegularExpressions.Regex regexGenerate() => global::System.Text.RegularExpressions.Generated.regexGenerate_0.Instance;
}

private sealed class Runner : RegexRunner
{
    /// <summary>Scan the <paramref name="inputSpan"/> starting from base.runtextstart for the next match.</summary>
    /// <param name="inputSpan">The text being scanned by the regular expression.</param>
    protected override void Scan(ReadOnlySpan<char> inputSpan)
    {
        // The pattern never matches anything.
    }
}

Expected behavior

91
91
91

Actual behavior

0
0
0

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response</issue_description>

<agent_instructions>Please add tests for the problem and fix it. I expect the problem was introduced in #118084 and that the fix is simply to ensure that capture groups are only removed from negative lookarounds if they're not part of a balancing group.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@stephentoub Why do you believe the current behavior is incorrect? It's unclear from your description as the Regression section is empty... is your "Expected behavior" from a previous release?

Fixes #120849


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 17, 2025
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix regex incorrect matching behavior in .NET10 Fix balancing groups incorrectly removed from negative lookarounds Oct 17, 2025
Copilot AI requested a review from stephentoub October 17, 2025 20:30
@stephentoub stephentoub marked this pull request as ready for review October 17, 2025 20:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where balancing groups inside negative lookarounds were incorrectly removed during regex optimization, causing patterns like ()(?'-1')(?!(?'-1')) to fail to match when they should. The fix preserves balancing groups while still removing regular capture groups from negative lookarounds.

  • Modified RemoveCaptures method to check if a capture node is a balancing group before removing it
  • Added test cases to verify correct matching behavior for balancing groups in negative lookarounds
  • All existing tests continue to pass, confirming the optimization for regular captures still works

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Added check to preserve balancing groups (N != -1) while removing regular captures (N == -1) from negative lookarounds
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Added test case verifying single match behavior with balancing groups in negative lookarounds
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Count.Tests.cs Added test case verifying correct match count for pattern with balancing groups in negative lookarounds

@stephentoub stephentoub added area-System.Text.RegularExpressions and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 17, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tarekgh tarekgh merged commit 9c700f7 into main Oct 18, 2025
82 of 84 checks passed
@stephentoub
Copy link
Member

/backport to release/10.0-staging

@stephentoub stephentoub deleted the copilot/fix-regex-matching-issue branch October 18, 2025 02:44
@github-actions
Copy link
Contributor

Started backporting to release/10.0-staging: https://github.com/dotnet/runtime/actions/runs/18609604713

@github-actions
Copy link
Contributor

@stephentoub an error occurred while backporting to "release/10.0-staging", please check the run log for details!

Error: The specified backport target branch "release/10.0-staging" wasn't found in the repo.

@hez2010
Copy link
Contributor

hez2010 commented Oct 18, 2025

/backport to release/10.0-staging

i believe there's no release/10.0-staging yet?

@stephentoub
Copy link
Member

stephentoub commented Oct 18, 2025

i believe there's no release/10.0-staging yet?

That's what the message above says, yes.
"Error: The specified backport target branch "release/10.0-staging" wasn't found in the repo."

@stephentoub
Copy link
Member

/backport to release/10.0

@github-actions
Copy link
Contributor

Started backporting to release/10.0: https://github.com/dotnet/runtime/actions/runs/18639594553

artl93 pushed a commit that referenced this pull request Oct 20, 2025
… lookarounds (#120888)

Backport of #120856 to release/10.0

/cc @stephentoub @Copilot

## Customer Impact

- [x] Customer reported
- [ ] Found internally

Balancing groups inside of negative lookarounds get removed erroneously,
erroneously changing the behavior of those (rare but possible) regexes.

## Regression

- [x] Yes
- [ ] No

Last .NET 10 preview.

## Testing

Additional unit tests added.

## Risk

Low. The fix is adding an extra condition to the if guarding an
optimization; it simply makes the optimization apply less often.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
stephentoub added a commit that referenced this pull request Oct 23, 2025
…120983)

Closes #117605

## Summary

This PR adds comprehensive regression tests to verify that captures made
within positive lookbehind assertions are preserved correctly across all
regex engines (Interpreter, Compiled, and SourceGenerated).

## Background

Issue #117605 reported that in Compiled mode, the .NET regex engine was
incorrectly removing captures made by capturing groups within positive
lookbehind assertions, while the Interpreted mode preserved them
correctly. For example:

```csharp
string pattern = "(?<=(abc)+?123)a";
string input = "abcabc123a";

var compiled = new Regex(pattern, RegexOptions.Compiled);
var interpreted = new Regex(pattern, RegexOptions.None);

var matchCompiled = compiled.Match(input);
var matchInterpreted = interpreted.Match(input);

// Expected: Both should have 1 capture in Group[1]
// Bug: Compiled showed 0 captures, Interpreted showed 1 capture
```

## Changes

Upon investigation, I found that **the issue has already been fixed** in
the current codebase. Both Compiled and Interpreted modes now produce
identical results for all test cases mentioned in the issue.

However, the existing tests in `Match_MemberData_Cases` only verified
match success and matched values, not the actual capture counts and
values. This PR adds comprehensive tests to `Match_Advanced_TestData`
that specifically verify:

- Capture counts match across all engines
- Capture values, indices, and lengths are correct
- Various quantifier patterns work correctly (`+?`, `{2,}?`, etc.)
- Multiple capturing groups in lookbehinds are handled properly

## Tests Added

1. **Main issue case**: `(?<=(abc)+?123)a` - Verifies lazy quantifier in
lookbehind
2. **Multiple captures**: `(?<=(abc){2,}?123)a` - Verifies bounded
quantifier with 2 captures
3. **Simple lookbehind**: `(?<=(abc)+?)a` - Verifies basic lazy
quantifier behavior
4. **Multiple groups**: `(?<=(?'1'abc)+?(?'2')123)a` - Verifies multiple
named groups in lookbehind

All tests properly handle the NonBacktracking engine (which doesn't
support lookbehinds) and run against all other available engines.

## Test Results

All 30,335 regex tests pass successfully with no regressions.

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>Compiled regex incorrectly removes the captures which
captured in positive lookbehind assertions</issue_title>
> <issue_description>### Description
> 
> I saw u added a test `yield return (@"(?<=(abc)+?123)", "abcabc123",
RegexOptions.None, 0, 9, true, "");` on line 135 in [file
Regex.Match.Tests.cs](https://github.com/dotnet/runtime/pull/117629/files#diff-2bdc6a6839e8ce7675737db0067c7cf288d190e0af1559feed81d5be92c9b216).
> Based on this, I found a new bug:
> In Compiled mode, when leaving positive lookbehind assertions, the
.NET regex engine will remove all the captures which captured by the
last direct group construction(the last direct child node).
> 
> 
> --------------------------------
> By the way, I found that when leaving negative assertions, the .NET
regex engine would not delete captures which captured in the assertion
on .NET8, but this bug fixed on .NET9.
> For example:
> ```c#
> using System;
> using System.Text.RegularExpressions;
> 
> Console.WriteLine(new
Regex("(?<!x())a",RegexOptions.Compiled).Match("a").Groups[1].Captures.Count);
> Console.WriteLine(new
Regex("(?!()x)a",RegexOptions.Compiled).Match("a").Groups[1].Captures.Count);
> ```
> Output on .NET8:
> ```
> 1
> 1
> ```
> Maybe they are related?
> 
> ### Reproduction Steps
> 
> ```c#
> using System;
> using System.Text.RegularExpressions;
> 
> string input = "abcabc123a";
> string pattern="(?<=(abc)+?123)a";
> //pattern="(?'1')(?<=(?'1'abc)+?(?#<=just delete the left
capture)(?'1'abc)+?123)a";
> //pattern="(?'1')(?<=(?'1'abc)+?(?'2')123)a";//the capture of group#1
was removed, but the capture of group#2 wasn't
> //pattern="(?<=(abc){2,}?123)a";//two captures of group#1 which
captured in the positive lookbehind assertions(?<=) are both deleted
> 
> //pattern="(?<=(abc){2}?123)a";//not deleted
> //pattern=@"(?<=\1?(abc){2,}?123)a";//not deleted
> //pattern="(?<=(?(1))(abc){2,}?123)a";//not deleted
> //pattern="(?<=(?'-1')?(abc){2,}?123)a";//not deleted
> 
> //pattern="1(?=x(abc)+?)";input="1xabc";//not deleted in positinve
lookahead assertions
> //pattern="1(?=(abc)+?x)";input="1abcx";//not deleted in positinve
lookahead assertions
> 
> try
> {
> 	Console.WriteLine("----------Compiled----------");
> Match matchCompiled = new Regex(pattern,
RegexOptions.Compiled).Match(input);
> 	Console.WriteLine(matchCompiled.Value);
> 	Console.WriteLine(matchCompiled.Groups[1].Captures.Count);
> 	Console.WriteLine("Groups#1 Captures:");
> 	foreach(Capture c in matchCompiled.Groups[1].Captures)
> 		Console.WriteLine($"{c.Value},{c.Index},{c.Length}");
> 	
> 	Console.WriteLine("----------Interpreted----------");
> Match Interpreted= new Regex(pattern, RegexOptions.None).Match(input);
> 	Console.WriteLine(Interpreted.Value);
> 	Console.WriteLine(Interpreted.Groups[1].Captures.Count);
> 	Console.WriteLine("Groups#1 Captures:");
> 	foreach(Capture c in Interpreted.Groups[1].Captures)
> 		Console.WriteLine($"{c.Value},{c.Index},{c.Length}");
> 	
> 	Console.WriteLine("----------result comparion----------");
> Console.WriteLine($"Result
equal:{matchCompiled.Groups[1].Captures.Count ==
Interpreted.Groups[1].Captures.Count}");
> }
> catch(Exception ex)
> {
> 	Console.WriteLine(ex.Message);
> }
> ```
> 
> ### Expected behavior
> 
> Output:
> ```
> ----------Compiled----------
> a
> 1
> Groups#1 Captures:
> abc,3,3
> ----------Interpreted----------
> a
> 1
> Groups#1 Captures:
> abc,3,3
> ----------result comparion----------
> Result equal:True
> ```
> 
> ### Actual behavior
> 
> Output:
> ```
> ----------Compiled----------
> a
> 0
> Groups#1 Captures:
> ----------Interpreted----------
> a
> 1
> Groups#1 Captures:
> abc,3,3
> ----------result comparion----------
> Result equal:False
> ```
> 
> ### Regression?
> 
> _No response_
> 
> ### Known Workarounds
> 
> _No response_
> 
> ### Configuration
> 
> _No response_
> 
> ### Other information
> 
> _No response_</issue_description>
> 
> <agent_instructions>Please check whether this issue still repros. If
it does, please fix it and add a test. If it doesn't, please just ensure
there's one or more tests covering the cited
issues.</agent_instructions>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> <comment_new><author>@stephentoub</author><body>
> > Did it fixed by #120856 yet?
> 
> No, that issue is unrelated.
> 
> That label issue still exists in the source generator. It's adding
braces in a way that's preventing the goto from seeing the target label,
which exists, just not in scope. I'd need to debug through to see why
the braces are being added.</body></comment_new>
> </comments>
> 


</details>

Fixes #117651

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Nov 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regex ()(?'-1')(?!(?'-1')) exhibit incorrect matching behavior in .NET10

4 participants