Skip to content

Update RegExps in documentation_comment.dart #3571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 5, 2023
Merged

Conversation

lrhn
Copy link
Member

@lrhn lrhn commented Nov 4, 2023

Some pure clean-up and simplification.
Some "linear backtracking"-protection.
One real change, from [^ ]+ to \S+, to not allow tabs or newlines in an unquoted argument.

lrhn added 2 commits November 4, 2023 21:14
Some pure clean-up and simplification.
Some "catastrophic backtracking"-protection.
One real change, from `[^ ]+` to `\S+`, to not allow tabs or newlines in an unquoted argument.
@@ -14,22 +14,19 @@ import 'package:meta/meta.dart';
import 'package:path/path.dart' as p show Context;

final _templatePattern = RegExp(
r'[ ]*{@template\s+(.+?)}([\s\S]+?){@endtemplate}[ ]*(\n?)',
multiLine: true);
r'[ ]*\{@template\s+(\S.*?)\}([^]+?)\{@endtemplate\}[ ]*(\n?)');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added \ before { and } just be cause it makes it clear they're not semantic.
The [^] is the same as [\s\S] (match any character).
The multiLine: true isn't used, there is no ^ or $ in the regexp.

The \s+(.+?)\} pattern risks "linear backtracking" (not as bad as catastrophic backtracking, but still unnecessary).
An input of


-- next line has many spaces -- 
{@template                                                                                      a
}

will let all the space characters match \s+, then followed by an a, which must be matched by the ., and a newline which matches neither . nor }.
So it goes back and matches one less space in the \s+, lets . match that space instead, and continue to still not match. In the end, it tries all the combinations before giving up.

Changing to \s+([^\s}].*?) makes the transition-point determinisitic, and also prevents matching {@template }} as having name }

@@ -675,8 +672,8 @@ mixin DocumentationComment
/// Match group 4 is the unquoted arg, if any.
static final RegExp _argPattern = RegExp(r'([a-zA-Z\-_0-9]+=)?' // option name
r'(?:' // Start a new non-capture group for the two possibilities.
r'''(["'])((?:\\{2})*|(?:.*?[^\\](?:\\{2})*))\2|''' // with quotes.
r'([^ ]+))'); // without quotes.
r'''(["'])((?:[^\\\r\n]|\\.)*?)\2|''' // with quotes.
Copy link
Member Author

@lrhn lrhn Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be equivalent, but simpler. The original RE stopped at the next quote not preceded by an odd number of backslashes.
The new one must match the character after a backslash as content, whether it's a backslash or not.
Effect should be the same, just much easier to read (IMO).

r'''(["'])((?:\\{2})*|(?:.*?[^\\](?:\\{2})*))\2|''' // with quotes.
r'([^ ]+))'); // without quotes.
r'''(["'])((?:[^\\\r\n]|\\.)*?)\2|''' // with quotes.
r'(\S+))'); // without quotes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change. It didn't seem reasonable to accept newlines or tabs as part of an unquoted argument. Matching non-whitespace seemed better than just non-space-char.

@lrhn lrhn requested a review from srawlins November 4, 2023 20:34
@srawlins
Copy link
Member

srawlins commented Nov 5, 2023

Thanks much @lrhn ! We're replacing these regular expressions with parsing in the analyzer. See doc_comment_builder.dart and in particular, _parseDocDirectiveTag.

@srawlins srawlins merged commit f47324d into main Nov 5, 2023
@srawlins srawlins deleted the lrhn-regexp-patch-1 branch November 5, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants