Skip to content

[BUG] the grammar comments are not always correct #71

Closed
@neumannt

Description

@neumannt

Out of curiosity I have an implemented an alternative parser for cppfront / cpp2, which uses a PEG grammar as input for a parser generator. During that experiment, I noticed that the grammar rules embedded as //G comments are not always correct. I will list errors that I noticed below.

One preliminary note: The cppfront compiler has a rather relaxed concept of keywords. In most cases it will accept a keyword were an identifier is expected, for example it will happily compile if: () -> void = { }. I don't think that is a good idea, my grammar explicitly distinguishes between keywords and identifiers. (Modulo the few context specific soft-keywords like in/out etc.). For some grammar rules that requires changes were the parser previously worked by accident (i.e, by not recognizing a certain keyword).

a) id_expression

    //G id-expression
    //G     unqualified-id
    //G     qualified-id
    //G

here the order is wrong, it should be

    //G id-expression
    //G     qualified-id
    //G     unqualified-id    
    //G

b) primary_expression

    //G primary-expression:
    //G     literal
    //G     ( expression-list )
    //G     id-expression
    //G     unnamed-declaration
    //G     inspect-expression
    //G

this does not correspond to the source code order. Furthermore, the expression-list is optional. And if we distinguish keywords from literals we potentially need some extra rules to handle keywords that are currently silently eaten as identifier. I would suggest

    //G primary-expression:
    //G     inspect-expression
    //G     id-expression
    //G     literal
    //G     '(' expression-list? ')'
    //G     unnamed-declaration
    //G     'nullptr'
    //G     'true'
    //G     'false'
    //G     'typeid' '(' expression ')'
    //G     'new' < id-expression > '(' expression-list? ')'

c) nested-name-specifier

    //G nested-name-specifier:
    //G     ::
    //G     unqualified-id ::

this has to support nested scopes. I would suggest

    //G nested-name-specifier:
    //G     :: (unqualified-id ::)*
    //G     (unqualified-id ::)+

d) template-argument

    //G template-argument:
    //G     expression
    //G     id-expression

There should be a comment here that we disable '<'/'>'/'<<'/'>>' in the expressions until a new parentheses is opened. In fact that causes some of the expression rules to be cloned until we reach the level below these operators. (In my implementation these are the rules with suffix _no_cmp).

e) id-expression from fundamental types

We want to accept builtin types like int as type ids. Currently this works by accident because the parser does not even recognize these as keywords. When enforcing that keywords are not identifiers we need rules for these, too. I have added a fundamental-type alternative at the end of id-expression, and have defines that as follows:

fundamental-type
  'void'
  fundamental-type-modifier_list? 'char'
  'char8_t'
  'char16_t'
  'char32_t'
  'wchar_t'
  fundamental-type-modifier-list? 'int'
  'bool'
  'float'
  'double'
  'long' 'double'
  fundamental-type-modifier-list

fundamental-type-modifier-list
  fundamental-type-modifier+

fundamental-type-modifier
  'unsigned'
  'signed'
  'long'
  'short'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions