Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace is underspecified in punctuation #2735

Open
eernstg opened this issue Dec 19, 2022 · 0 comments
Open

Whitespace is underspecified in punctuation #2735

eernstg opened this issue Dec 19, 2022 · 0 comments
Labels
question Further information is requested specification technical-debt Dealing with a part of the language which needs clarification or adjustments

Comments

@eernstg
Copy link
Member

eernstg commented Dec 19, 2022

Cf. dart-lang/sdk#50776.

The Dart grammar specifies symbol literals in terms of smaller parts, i.e., symbols are not recognized at the lexical level, but rather at the context free level. This implies that it is possible, according to the grammar, to have whitespace in symbol literals, for example # foo . bar.

The specification says that the value of the symbol is based on the source code of the symbol literal without any whitespace (which is again a confirmation that the whitespace is allowed).

A similar situation exists with a number of punctuation symbols which are used in different ways in different contexts (in particular > which can be the end of an actual type argument list, and a part of an operator like >> or >>>=).

However, the tools (the analyzer and the common front end) report an error for many occurrences of whitespace, including #[ ], #[] =, x >>> = 1, etc.

A Dart parser needs to maintain the detailed structure during lexical analysis, because the same sequence of characters can have a different structure, depending on the context: x >>>= 1 makes >>>= a single compound-assignment operator, but List<List<List<int>>>==List<List<int>> makes each of the > an end-of-type-arguments marker, and == an equality operator. So we can't simply let a lexer swallow >>>= as one token.

This implies that the rules that we might have to make # [ ] an error would be specified outside the grammar.

(Aside: As far as I know, the Dart parser used by the analyzer and the common front end does actually step out of the level of regular languages during the construction of tokens, e.g., by changing > > to >> if there are no matching <s in the tokens produced so far, but we can't assume that all Dart parsers will be able to match up parenthesis-like tokens during lexical analysis.)

It would make sense to specify such rules about errors for whitespace, at least in the cases where the tools currently report an error for this kind of whitespace.

@eernstg eernstg added question Further information is requested specification technical-debt Dealing with a part of the language which needs clarification or adjustments labels Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested specification technical-debt Dealing with a part of the language which needs clarification or adjustments
Projects
None yet
Development

No branches or pull requests

1 participant