Skip to content

Invalid unicode character in identifier not rejected by Clang when concatenated with macros #101342

Open
@Michael137

Description

@Michael137

The following code gets accepted by Clang top-of-tree (tested on top of commit bb064535bd071c1bddaf55ff7fe283fc8d23c1fc):

#define DOT •

#define CONCAT_IMPL(Left, Separator, Right) Left##Separator##Right
#define CONCAT(Left, Separator, Right) CONCAT_IMPL(Left, Separator, Right)

#define MAKE_CLASS_NAME(A, B) CONCAT(A, DOT, B)

int main() {
    // struct foo•bar {} x;
    struct MAKE_CLASS_NAME(foo, bar) { int val = 0; } x;

    return x.val;
}
$ ./bin/clang++ -std=c++2a unicode.cpp
$ nm a.out
0000000100003f5c t __ZZ4mainEN9foo•barC1Ev
0000000100003f88 t __ZZ4mainEN9foo•barC2Ev
0000000100000000 T __mh_execute_header
0000000100003f34 T _main

But when I just take the pre-processed of this and run Clang on it, we error out as expected:

$ ./bin/clang++ -E -std=c++2a unicode.cpp > processed.cpp
$ ./bin/clang++ processed.cpp -std=c++2a
unicode.cpp:9:15: error: character <U+2022> not allowed in an identifier
    9 |     struct foo•bar {} x;
      |               ^
1 error generated.

It looks like we should reject the compilation in both cases.

NB, GCC rejects both cases: https://godbolt.org/z/8cfj6s3P8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions