Skip to content

Compiled regex incorrectly removes the captures which captured in positive lookbehind assertions #117651

@longxya

Description

@longxya

Description

I saw u added a test yield return (@"(?<=(abc)+?123)", "abcabc123", RegexOptions.None, 0, 9, true, ""); on line 135 in file Regex.Match.Tests.cs.
Based on this, I found a new bug:
In Compiled mode, when leaving positive lookbehind assertions, the .NET regex engine will remove all the captures which captured by the last direct group construction(the last direct child node).


By the way, I found that when leaving negative assertions, the .NET regex engine would not delete captures which captured in the assertion on .NET8, but this bug fixed on .NET9.
For example:

using System;
using System.Text.RegularExpressions;

Console.WriteLine(new Regex("(?<!x())a",RegexOptions.Compiled).Match("a").Groups[1].Captures.Count);
Console.WriteLine(new Regex("(?!()x)a",RegexOptions.Compiled).Match("a").Groups[1].Captures.Count);

Output on .NET8:

1
1

Maybe they are related?

Reproduction Steps

using System;
using System.Text.RegularExpressions;

string input = "abcabc123a";
string pattern="(?<=(abc)+?123)a";
//pattern="(?'1')(?<=(?'1'abc)+?(?#<=just delete the left capture)(?'1'abc)+?123)a";
//pattern="(?'1')(?<=(?'1'abc)+?(?'2')123)a";//the capture of group#1 was removed, but the capture of group#2 wasn't
//pattern="(?<=(abc){2,}?123)a";//two captures of group#1 which captured in the positive lookbehind assertions(?<=) are both deleted

//pattern="(?<=(abc){2}?123)a";//not deleted
//pattern=@"(?<=\1?(abc){2,}?123)a";//not deleted
//pattern="(?<=(?(1))(abc){2,}?123)a";//not deleted
//pattern="(?<=(?'-1')?(abc){2,}?123)a";//not deleted

//pattern="1(?=x(abc)+?)";input="1xabc";//not deleted in positinve lookahead assertions
//pattern="1(?=(abc)+?x)";input="1abcx";//not deleted in positinve lookahead assertions

try
{
	Console.WriteLine("----------Compiled----------");
	Match matchCompiled = new Regex(pattern, RegexOptions.Compiled).Match(input);
	Console.WriteLine(matchCompiled.Value);
	Console.WriteLine(matchCompiled.Groups[1].Captures.Count);
	Console.WriteLine("Groups#1 Captures:");
	foreach(Capture c in matchCompiled.Groups[1].Captures)
		Console.WriteLine($"{c.Value}{c.Index}{c.Length}");
	
	Console.WriteLine("----------Interpreted----------");
	Match Interpreted= new Regex(pattern, RegexOptions.None).Match(input);
	Console.WriteLine(Interpreted.Value);
	Console.WriteLine(Interpreted.Groups[1].Captures.Count);
	Console.WriteLine("Groups#1 Captures:");
	foreach(Capture c in Interpreted.Groups[1].Captures)
		Console.WriteLine($"{c.Value}{c.Index}{c.Length}");
	
	Console.WriteLine("----------result comparion----------");
	Console.WriteLine($"Result equal:{matchCompiled.Groups[1].Captures.Count == Interpreted.Groups[1].Captures.Count}");
}
catch(Exception ex)
{
	Console.WriteLine(ex.Message);
}

Expected behavior

Output:

----------Compiled----------
a
1
Groups#1 Captures:
abc,3,3
----------Interpreted----------
a
1
Groups#1 Captures:
abc,3,3
----------result comparion----------
Result equal:True

Actual behavior

Output:

----------Compiled----------
a
0
Groups#1 Captures:
----------Interpreted----------
a
1
Groups#1 Captures:
abc,3,3
----------result comparion----------
Result equal:False

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions