Skip to content

Commit e317e1e

Browse files
Copilotstephentoubtarekgh
authored
Fix GeneratedRegex fixer to preserve multiline verbatim string patterns (#120624)
## Fix for GeneratedRegex fixer preserving multiline patterns This PR addresses the issue where the GeneratedRegex code fixer converts multiline verbatim string literals into single-line strings with escape sequences, losing readability. ### Problem When a regex pattern contains actual newlines (from verbatim string literals or string concatenation), the fixer converts them to escape sequences like `\r\n`, making patterns with `RegexOptions.IgnorePatternWhitespace` unreadable. ### Solution - [x] Analyze the codebase and understand the issue - [x] Modify `UpgradeToGeneratedRegexCodeFixer.cs` to detect newlines in pattern strings - [x] Update logic to preserve verbatim string format when pattern contains newlines - [x] Add comprehensive tests for various scenarios: - Verbatim string with actual newlines - String concatenation with newlines - [x] Build and test the changes - [x] Verify all existing tests still pass - [x] Verify no build warnings or errors - [x] Address review feedback: Use IndexOfAny for better performance - [x] Fix tests to match current property-based syntax ### Changes Made 1. Added `ShouldUseVerbatimString` helper method that checks for backslashes, newlines (`\n`), or carriage returns (`\r`) using `IndexOfAny` for optimal performance 2. Updated pattern handling in `GetNode` method to use verbatim string syntax when any of these characters are present 3. Added two new tests to validate the fix works for both direct verbatim strings and string concatenation: - `MultilineVerbatimStringPreservedByFixer` - `MultilineStringConcatenationPreservedByFixer` 4. Updated tests to use property syntax (`{ get; }`) instead of method syntax to match current codebase ### Test Results ✅ All analyzer tests pass (116 tests) ✅ New tests specifically validate multiline pattern preservation ✅ No build warnings or errors ✅ Works for both direct verbatim strings and string concatenation scenarios ### Impact This fix greatly improves readability for regex patterns with `RegexOptions.IgnorePatternWhitespace`, particularly for complex patterns in projects like MSBuild that use string fragment composition. The fix preserves the original formatting intent from the source code. <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>GeneratedRegex fixer forces pattern onto a single line</issue_title> <issue_description> Consider ```c# private static Regex r = new Regex(@"a b c", RegexOptions.IgnorePatternWhitespace); ``` run the fixer, now I have ```c# [GeneratedRegex("a\r\n b\r\n c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` The semantics are the same, but the readability is gone. I would expect ```c# [GeneratedRegex(@"a b c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` I see #69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.</issue_description> ## Comments on the Issue (you are @copilot in this section) <comments> <comment_new><author>@</author><body> Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed. <details> <summary>Issue Details</summary> <hr /> ``` Consider ```c# private static Regex r = new Regex(@"a b c", RegexOptions.IgnorePatternWhitespace); ``` run the fixer, now I have ```c# [GeneratedRegex("a\r\n b\r\n c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` The semantics are the same, but the readability is gone. I would expect ```c# [GeneratedRegex(@"a b c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` I see #69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form. <table> <tr> <th align="left">Author:</th> <td>danmoseley</td> </tr> <tr> <th align="left">Assignees:</th> <td>-</td> </tr> <tr> <th align="left">Labels:</th> <td> `area-System.Text.RegularExpressions` </td> </tr> <tr> <th align="left">Milestone:</th> <td>-</td> </tr> </table> </details></body></comment_new> <comment_new><author>@danmoseley</author><body> This comes up eg., in this MSBuild example ```c# const string itemMetadataSpecification = @"%\(\s*; (?<ITEM_SPECIFICATION>(?<TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*@ \s*\)"; private Regex s_itemMetadataPattern = new(itemMetadataSpecification, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture); ``` nice and pretty, I run the fixer and it produces this ugly thing ```c# [GeneratedRegex("%\\(\\s*;\r\n (?<ITEM_SPECIFICATION>(?<TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?\r\n (?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*@\r\n \\s*\\)", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` as an aside, by running the fixer I'm implicitly okaying it inlining the compound strings, which I think is fine (and inevitable)</body></comment_new> <comment_new><author>@danmoseley</author><body> Built latest bits and found it's already fixed by #78172</body></comment_new> <comment_new><author>@danmoseley</author><body> Actually, it's only partially fixed. Consider ```c# private const string foo = "bar"; private static Regex r1 = new Regex(@"a " + foo + @" b c", RegexOptions.IgnorePatternWhitespace); private static Regex r2 = new Regex(@"a bar b c", RegexOptions.IgnorePatternWhitespace); ``` Both should produce identical results. However, in the first case, I lose the visible whitespace -- ```c# [GeneratedRegex("a bar\r\n b\r\n c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); [GeneratedRegex(@"a bar b c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex1(); ``` Unfortunately many of the dotnet/msbuild regexes are built up by compounding reused string fragments, which in some cases are compounded other ones, and use IgnorePatternWhitespace. This means after running the generator, they need to be fixed by hand.</body></comment_new> </comments> </details> Fixes #79891 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com> Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com> Co-authored-by: Stephen Toub <stoub@microsoft.com>
1 parent 84642c9 commit e317e1e

File tree

2 files changed

+70
-2
lines changed

2 files changed

+70
-2
lines changed

src/libraries/System.Text.RegularExpressions/gen/UpgradeToGeneratedRegexCodeFixer.cs

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -382,9 +382,9 @@ private static RegexOptions GetRegexOptionsFromArgument(ImmutableArray<IArgument
382382
return SyntaxFactory.ParseExpression(optionsLiteral);
383383

384384
case UpgradeToGeneratedRegexAnalyzer.PatternArgumentName:
385-
if (argument.Value.ConstantValue.Value is string str && str.Contains('\\'))
385+
if (argument.Value.ConstantValue.Value is string str && ShouldUseVerbatimString(str))
386386
{
387-
// Special handling for string patterns with escaped characters
387+
// Special handling for string patterns with escaped characters or newlines
388388
string escapedVerbatimText = str.Replace("\"", "\"\"");
389389
return SyntaxFactory.ParseExpression($"@\"{escapedVerbatimText}\"");
390390
}
@@ -401,6 +401,13 @@ private static RegexOptions GetRegexOptionsFromArgument(ImmutableArray<IArgument
401401
return null;
402402
}
403403

404+
private static bool ShouldUseVerbatimString(string str)
405+
{
406+
// Use verbatim string syntax if the string contains backslashes or newlines
407+
// to preserve readability, especially for patterns with RegexOptions.IgnorePatternWhitespace
408+
return str.IndexOfAny(['\\', '\n', '\r']) >= 0;
409+
}
410+
404411
private static string Literal(string stringifiedRegexOptions)
405412
{
406413
if (int.TryParse(stringifiedRegexOptions, NumberStyles.Integer, CultureInfo.InvariantCulture, out int options))

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1339,6 +1339,67 @@ static void Main(string[] args)
13391339
await VerifyCS.VerifyCodeFixAsync(test, expectedFixedCode);
13401340
}
13411341

1342+
[Fact]
1343+
public async Task MultilineVerbatimStringPreservedByFixer()
1344+
{
1345+
string test = """
1346+
using System.Text.RegularExpressions;
1347+
1348+
static class Class
1349+
{
1350+
private static Regex r = [|new Regex(@"a
1351+
b
1352+
c", RegexOptions.IgnorePatternWhitespace)|];
1353+
}
1354+
""";
1355+
1356+
string expectedFixedCode = """
1357+
using System.Text.RegularExpressions;
1358+
1359+
static partial class Class
1360+
{
1361+
[GeneratedRegex(@"a
1362+
b
1363+
c", RegexOptions.IgnorePatternWhitespace)]
1364+
private static partial Regex r { get; }
1365+
}
1366+
""";
1367+
1368+
await VerifyCS.VerifyCodeFixAsync(test, expectedFixedCode);
1369+
}
1370+
1371+
[Fact]
1372+
public async Task MultilineStringConcatenationPreservedByFixer()
1373+
{
1374+
string test = """
1375+
using System.Text.RegularExpressions;
1376+
1377+
static class Class
1378+
{
1379+
private const string foo = "bar";
1380+
private static Regex r1 = [|new Regex(@"a " + foo + @"
1381+
b
1382+
c", RegexOptions.IgnorePatternWhitespace)|];
1383+
}
1384+
""";
1385+
1386+
string expectedFixedCode = """
1387+
using System.Text.RegularExpressions;
1388+
1389+
static partial class Class
1390+
{
1391+
private const string foo = "bar";
1392+
1393+
[GeneratedRegex(@"a bar
1394+
b
1395+
c", RegexOptions.IgnorePatternWhitespace)]
1396+
private static partial Regex r1 { get; }
1397+
}
1398+
""";
1399+
1400+
await VerifyCS.VerifyCodeFixAsync(test, expectedFixedCode);
1401+
}
1402+
13421403
[Fact]
13431404
public async Task TestAsArgument()
13441405
{

0 commit comments

Comments
 (0)