-
Notifications
You must be signed in to change notification settings - Fork 125
Update RegExps in documentation_comment.dart #3571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some pure clean-up and simplification. Some "catastrophic backtracking"-protection. One real change, from `[^ ]+` to `\S+`, to not allow tabs or newlines in an unquoted argument.
Avoid `{@template }}` being parsed.
@@ -14,22 +14,19 @@ import 'package:meta/meta.dart'; | |||
import 'package:path/path.dart' as p show Context; | |||
|
|||
final _templatePattern = RegExp( | |||
r'[ ]*{@template\s+(.+?)}([\s\S]+?){@endtemplate}[ ]*(\n?)', | |||
multiLine: true); | |||
r'[ ]*\{@template\s+(\S.*?)\}([^]+?)\{@endtemplate\}[ ]*(\n?)'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added \
before {
and }
just be cause it makes it clear they're not semantic.
The [^]
is the same as [\s\S]
(match any character).
The multiLine: true
isn't used, there is no ^
or $
in the regexp.
The \s+(.+?)\}
pattern risks "linear backtracking" (not as bad as catastrophic backtracking, but still unnecessary).
An input of
-- next line has many spaces --
{@template a
}
will let all the space characters match \s+
, then followed by an a
, which must be matched by the .
, and a newline which matches neither .
nor }
.
So it goes back and matches one less space in the \s+
, lets .
match that space instead, and continue to still not match. In the end, it tries all the combinations before giving up.
Changing to \s+([^\s}].*?)
makes the transition-point determinisitic, and also prevents matching {@template }}
as having name }
@@ -675,8 +672,8 @@ mixin DocumentationComment | |||
/// Match group 4 is the unquoted arg, if any. | |||
static final RegExp _argPattern = RegExp(r'([a-zA-Z\-_0-9]+=)?' // option name | |||
r'(?:' // Start a new non-capture group for the two possibilities. | |||
r'''(["'])((?:\\{2})*|(?:.*?[^\\](?:\\{2})*))\2|''' // with quotes. | |||
r'([^ ]+))'); // without quotes. | |||
r'''(["'])((?:[^\\\r\n]|\\.)*?)\2|''' // with quotes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be equivalent, but simpler. The original RE stopped at the next quote not preceded by an odd number of backslashes.
The new one must match the character after a backslash as content, whether it's a backslash or not.
Effect should be the same, just much easier to read (IMO).
r'''(["'])((?:\\{2})*|(?:.*?[^\\](?:\\{2})*))\2|''' // with quotes. | ||
r'([^ ]+))'); // without quotes. | ||
r'''(["'])((?:[^\\\r\n]|\\.)*?)\2|''' // with quotes. | ||
r'(\S+))'); // without quotes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change. It didn't seem reasonable to accept newlines or tabs as part of an unquoted argument. Matching non-whitespace seemed better than just non-space-char.
Thanks much @lrhn ! We're replacing these regular expressions with parsing in the analyzer. See doc_comment_builder.dart and in particular, |
Some pure clean-up and simplification.
Some "linear backtracking"-protection.
One real change, from
[^ ]+
to\S+
, to not allow tabs or newlines in an unquoted argument.