-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Disable -fdollars-in-identifiers by default #135407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Clang used to enable -fdollars-in-identifiers by default for compatibility with GCC. However, this is no longer a conforming extension after WG21 P2558R2 and WG14 N2701. So this disables the dialect by default, which is a potentially breaking change for users. Note: some inline assembly constructs may use dollars in identifiers. We cannot enable the dialect mode automatically based on the user passing -fasm-blocks because that flag is implied by -fms-extensions which is enabled by default on Windows, and thus would mean we'd be enabling a non-conforming language extension by default on that platform. Users impacted by the change should explicitly add -fdollars-in-identifiers to their build scripts. Partially addresses llvm#128939
@llvm/pr-subscribers-clang-format @llvm/pr-subscribers-clang Author: Aaron Ballman (AaronBallman) ChangesClang used to enable -fdollars-in-identifiers by default for compatibility with GCC. However, this is no longer a conforming extension after WG21 P2558R2 and WG14 N2701. So this disables the dialect by default, which is a potentially breaking change for users. Note: some inline assembly constructs may use dollars in identifiers. We cannot enable the dialect mode automatically based on the user passing -fasm-blocks because that flag is implied by -fms-extensions which is enabled by default on Windows, and thus would mean we'd be enabling a non-conforming language extension by default on that platform. Users impacted by the change should explicitly add -fdollars-in-identifiers to their build scripts. Partially addresses #128939 Full diff: https://github.com/llvm/llvm-project/pull/135407.diff 10 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 9c45965dc4d82..fe51de6fd5b7c 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -48,6 +48,11 @@ C/C++ Language Potentially Breaking Changes
ensure they are not caught by these optimizations. It is also possible to use
``-fwrapv-pointer`` or ``-fno-delete-null-pointer-checks`` to make pointer arithmetic
on null pointers well-defined. (#GH130734, #GH130742, #GH130952)
+- Use of the dollar sign (``$``) in an identifier is no longer a conforming
+ extension in either C or C++, so ``-fdollars-in-identifiers`` is no longer
+ enabled by default. Use of the dollar sign in an identifier will now be
+ diagnosed as an error unless ``-fdollars-in-identifiers`` is explicitly
+ enabled.
C++ Specific Potentially Breaking Changes
-----------------------------------------
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 3879cc7942877..f08e179a38067 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -119,7 +119,7 @@ LANGOPT(WChar , 1, 0, "wchar_t keyword")
LANGOPT(Char8 , 1, 0, "char8_t keyword")
LANGOPT(IEEE128 , 1, 0, "__ieee128 keyword")
LANGOPT(DeclSpecKeyword , 1, 0, "__declspec keyword")
-BENIGN_LANGOPT(DollarIdents , 1, 1, "'$' in identifiers")
+BENIGN_LANGOPT(DollarIdents , 1, 0, "'$' in identifiers")
BENIGN_LANGOPT(AsmPreprocessor, 1, 0, "preprocessor in asm mode")
LANGOPT(GNUMode , 1, 1, "GNU extensions")
LANGOPT(GNUKeywords , 1, 1, "GNU keywords")
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index c1020b234b136..38eb332f40d27 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2088,7 +2088,7 @@ def fno_discard_value_names : Flag<["-"], "fno-discard-value-names">,
Group<f_clang_Group>, Visibility<[ClangOption, DXCOption]>,
HelpText<"Do not discard value names in LLVM IR">;
defm dollars_in_identifiers : BoolFOption<"dollars-in-identifiers",
- LangOpts<"DollarIdents">, Default<!strconcat("!", asm_preprocessor.KeyPath)>,
+ LangOpts<"DollarIdents">, DefaultFalse,
PosFlag<SetTrue, [], [ClangOption], "Allow">,
NegFlag<SetFalse, [], [ClangOption], "Disallow">,
BothFlags<[], [ClangOption, CC1Option], " '$' in identifiers">>;
diff --git a/clang/test/AST/ByteCode/codegen.m b/clang/test/AST/ByteCode/codegen.m
index 6139596c6337a..a7b3a100165eb 100644
--- a/clang/test/AST/ByteCode/codegen.m
+++ b/clang/test/AST/ByteCode/codegen.m
@@ -3,7 +3,7 @@
/// See test/CodeGenObjC/constant-strings.m
/// Test that we let the APValue we create for ObjCStringLiterals point to the right expression.
-// RUN: %clang_cc1 -triple x86_64-macho -emit-llvm -o %t %s -fexperimental-new-constant-interpreter
+// RUN: %clang_cc1 -triple x86_64-macho -fdollars-in-identifiers -emit-llvm -o %t %s -fexperimental-new-constant-interpreter
// RUN: FileCheck --check-prefix=CHECK-NEXT < %t %s
// Check that we set alignment 1 on the string.
diff --git a/clang/test/C/drs/dr0xx.c b/clang/test/C/drs/dr0xx.c
index 5fe023deaece9..ffcd63b0cc9a7 100644
--- a/clang/test/C/drs/dr0xx.c
+++ b/clang/test/C/drs/dr0xx.c
@@ -1,9 +1,9 @@
-/* RUN: %clang_cc1 -std=c89 -fsyntax-only -verify=expected,c89only -pedantic -Wno-declaration-after-statement -Wno-c11-extensions %s
- RUN: %clang_cc1 -std=c89 -fsyntax-only -verify=expected,c89only -pedantic -Wno-declaration-after-statement -Wno-c11-extensions -fno-signed-char %s
- RUN: %clang_cc1 -std=c99 -fsyntax-only -verify=expected,c99untilc2x -pedantic -Wno-c11-extensions %s
- RUN: %clang_cc1 -std=c11 -fsyntax-only -verify=expected,c99untilc2x -pedantic %s
- RUN: %clang_cc1 -std=c17 -fsyntax-only -verify=expected,c99untilc2x -pedantic %s
- RUN: %clang_cc1 -std=c2x -fsyntax-only -verify=expected,c2xandup -pedantic %s
+/* RUN: %clang_cc1 -std=c89 -fdollars-in-identifiers -fsyntax-only -verify=expected,c89only -pedantic -Wno-declaration-after-statement -Wno-c11-extensions %s
+ RUN: %clang_cc1 -std=c89 -fdollars-in-identifiers -fsyntax-only -verify=expected,c89only -pedantic -Wno-declaration-after-statement -Wno-c11-extensions -fno-signed-char %s
+ RUN: %clang_cc1 -std=c99 -fdollars-in-identifiers -fsyntax-only -verify=expected,c99untilc2x -pedantic -Wno-c11-extensions %s
+ RUN: %clang_cc1 -std=c11 -fdollars-in-identifiers -fsyntax-only -verify=expected,c99untilc2x -pedantic %s
+ RUN: %clang_cc1 -std=c17 -fdollars-in-identifiers -fsyntax-only -verify=expected,c99untilc2x -pedantic %s
+ RUN: %clang_cc1 -std=c2x -fdollars-in-identifiers -fsyntax-only -verify=expected,c2xandup -pedantic %s
*/
/* The following are DRs which do not require tests to demonstrate
diff --git a/clang/test/CodeGen/ms-inline-asm.c b/clang/test/CodeGen/ms-inline-asm.c
index c3eef9a23e166..811e3ccd2a89a 100644
--- a/clang/test/CodeGen/ms-inline-asm.c
+++ b/clang/test/CodeGen/ms-inline-asm.c
@@ -1,5 +1,5 @@
// REQUIRES: x86-registered-target
-// RUN: %clang_cc1 %s -triple i386-apple-darwin10 -fasm-blocks -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 %s -triple i386-apple-darwin10 -fdollars-in-identifiers -fasm-blocks -emit-llvm -o - | FileCheck %s
void t1(void) {
// CHECK: @t1
diff --git a/clang/test/CodeGenObjC/extern-void-class-decl.m b/clang/test/CodeGenObjC/extern-void-class-decl.m
index 826622b94c1bb..55b2f9565439a 100644
--- a/clang/test/CodeGenObjC/extern-void-class-decl.m
+++ b/clang/test/CodeGenObjC/extern-void-class-decl.m
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -triple x86_64-apple-macosx10.14.0 %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-macosx10.14.0 -fdollars-in-identifiers %s -emit-llvm -o - | FileCheck %s
extern void OBJC_CLASS_$_f;
Class c = (Class)&OBJC_CLASS_$_f;
diff --git a/clang/test/Lexer/dollar-idents.c b/clang/test/Lexer/dollar-idents.c
index a1263b4e572cc..6fd32a13bc591 100644
--- a/clang/test/Lexer/dollar-idents.c
+++ b/clang/test/Lexer/dollar-idents.c
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s
+// RUN: %clang_cc1 -dump-tokens -fdollars-in-identifiers %s 2>&1 | FileCheck %s
// RUN: %clang_cc1 -dump-tokens -x assembler-with-cpp %s 2>&1 | FileCheck %s --check-prefix=CHECK-ASM
// PR3808
diff --git a/clang/test/Lexer/gh128939.cpp b/clang/test/Lexer/gh128939.cpp
new file mode 100644
index 0000000000000..8abd8c8c6ec0a
--- /dev/null
+++ b/clang/test/Lexer/gh128939.cpp
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -E -fdollars-in-identifiers %s 2>&1 | FileCheck %s --check-prefix=CHECK-DOLLARS
+// RUN: %clang_cc1 -E %s 2>&1 | FileCheck %s --check-prefix=CHECK-NO-DOLLARS
+// GH128939
+
+#define FOO$ 10
+#define STR(x) #x
+#define STR2(x) STR(x)
+const char *p = STR2(FOO$);
+
+// CHECK-NO-DOLLARS: const char *p = "$ 10$";
+// CHECK-DOLLARS: const char *p = "10";
+
+#define STR3 STR(
+const char *q = STR3$10);
+
+// CHECK-NO-DOLLARS: const char *q = "$10";
+// CHECK-DOLLARS: const char *q = STR3$10);
diff --git a/clang/test/Preprocessor/c90.c b/clang/test/Preprocessor/c90.c
index 3b9105fe6ee91..2d64c4be1234d 100644
--- a/clang/test/Preprocessor/c90.c
+++ b/clang/test/Preprocessor/c90.c
@@ -7,7 +7,7 @@
#define foo`bar /* expected-error {{whitespace required after macro name}} */
#define foo2!bar /* expected-warning {{whitespace recommended after macro name}} */
-#define foo3$bar /* expected-error {{'$' in identifier}} */
+#define foo3$bar /* expected-error {{whitespace required after macro name}} */
/* CHECK-NOT: this comment should be missing
* CHECK: {{^}}// this comment should be present{{$}}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to do this, I think we need better diagnostics. Just straight disabling this is going to give very confusing diagnostics to anyone actually using dollar-signs in identifiers.
Some ideas:
- We can give a warning if we see a "$" adjacent to an identifier without any whitespace separating it.
- Outside the preprocessor, we can parse a "$" adjacent to an identifier as part of the identifier, with some sort of diagnostic, since it's guaranteed to be an error anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -1,5 +1,5 @@ | |||
// REQUIRES: x86-registered-target | |||
// RUN: %clang_cc1 %s -triple i386-apple-darwin10 -fasm-blocks -emit-llvm -o - | FileCheck %s | |||
// RUN: %clang_cc1 %s -triple i386-apple-darwin10 -fdollars-in-identifiers -fasm-blocks -emit-llvm -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just musing... if we wanted this to work automatically, we could implement detection of __asm
in the preprocessor, and enable dollars-in-identifiers when lexing inside an __asm
block. Detection of __asm
at that phase is also necessary to properly handle ;
comments, which right now we incorrectly discard after preprocessing. (I'm not sure if there are any other lexing differences for MS __asm
that we don't properly handle, but there might be.) But it's probably not worth it for a feature that only works (in MSVC at least) when targeting 32-bit x86. :)
Perhaps a better idea would be to enable -fdollars-in-identifiers
under -fms-compatibility
, given that MSVC still enables support for dollars in identifiers even in its conforming mode. Though perhaps we can leave that option as a fallback for if someone actually complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better idea would be to enable -fdollars-in-identifiers under -fms-compatibility
I thought about this but explicitly decided against it because that means dollars in identifiers is enabled by default for anyone using Clang built by MSVC and targeting Windows. I think we may want to see what folks run into; maybe we want to enable it only for clang-cl and not clang? Maybe we want to be more clever than that? I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we want to enable it only for clang-cl and not clang?
This sounds like an even less expected corner case that would be very annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THANKS!
My concern here is with regressing performance of the lexer; testing those conditions when lexing any identifier seems like we'd be spending a lot of time trying to catch a very uncommon issue. I think this is reasonable as follow-up work, but I don't think it should hold up this PR because this is fixing a conformance issue. (FWIW, I think the behavior you get currently is something we could live with even if that follow-up work never happened.) |
Are you okay with me punting on this work and landing the changes as-os @efriedma-quic ? |
If you don't want to do anything fancy, can you just add a warning in Lexer::LexTokenInternal if we see a '$' (where we set the token type to tok::unknown)? That should be easy to implement, and have zero impact on lexer performance. |
clang/test/Preprocessor/c90.c
Outdated
@@ -7,7 +7,7 @@ | |||
#define foo`bar /* expected-error {{whitespace required after macro name}} */ | |||
#define foo2!bar /* expected-warning {{whitespace recommended after macro name}} */ | |||
|
|||
#define foo3$bar /* expected-error {{'$' in identifier}} */ | |||
#define foo3$bar /* expected-error {{whitespace required after macro name}} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this diagnostic change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In -pedentic-errors
mode before the patch, dollar signs in identifiers are recognized and so we get #
define
<identifier>
and issue the pedantic diagnostic. With the patch, dollar signs in identifiers are not recognized at all and so we get #
define
foo
$
bar
which has no whitespace between the macro name and its replacement list.
Okay, that's pretty reasonable. Note, we only emit the diagnostic when finding |
It appears that gcc only disallows '$' in identifiers in C++26 mode. Not sure how relevant that is to clang. bugzilla Or am I misreading that thread? |
I get the same reading, but I am not certain we want to follow suit unless there's some significant fallout from this change (which I don't expect). Disabling dollars in identifiers in all language modes is a more consistent behavior for the extension instead of having it depend on the language mode. |
Doesn't the Linux kernel use it for syscall names? Might be wrong on this. |
CC @nathanchance @nickdesaulniers for more information, but even if the kernel does, is it a significant imposition to pass |
This will indeed break building the Linux kernel for x86_64 at least in my brief test, mainly because it uses the preprocessor for assembly files but there are some macros that are included in both C and assembly that trigger it as well. For example, all these warnings will get upgraded to errors with
I am sure that would not be too much of a problem (as long as I have sufficient justification, it is unfortunate if there is a break from GCC's behavior) but it will take some time to get it merged and backported to all supported releases. |
We should disable the warning if we're preprocessing asm, which should mostly solve the issue with warnings. For the Linux kernel, does anything break other than the warnings? |
Effectively checking for |
Yes although the warnings from
Does not look like it (aside from one file that does not respect the global Since I already tested it, I figured I might as well report it anyways. If I add
|
As far as I can tell, GCC has not disallowed '$' in identifiers at all (both by testing and by inspection of that patch). I'm not sure it really makes sense to do this change in Clang, either. Firstly, it seems entirely unclear if the C++ proposal really intended to require this. The proposal explicitly noted that the change would break using a UCN for dollar-sign in an identifier -- which is pretty suggestive that the authors did not expect it to forbid spelling the identifier with the actual dollar-sign character in implementations supporting that extension ("For extensions that allow, for example, $ in identifiers, no one outside of compiler test suites, is likely to use a UCN to spell that.") Also, I note that C was repaired to re-permit "$" as an implementation-defined identifier character extension, via "It is implementation-defined if a $ (U+0024, DOLLAR SIGN) may be used as a nondigit character." from https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3145.pdf -- that document seems to indicate that they felt the removal of the permission for this implementation-extension was an unintended defect in the original change. So, I wonder if this may be C++ defect? That C++ should be re-harmonized with C, and permit implementations to allow $ as an identifier character. And, thus, that we should await such a DR fix, and make no change to Clang's behavior. |
Neither. DollarIdents is already off by default in AsmPreprocessor mode. We should just preserve the existing behavior, and not warn. (I'm not I really understand all the standards-committee stuff, but it seems dubious to have the semantics of $ in preprocessor macros implementation-defined.) |
Clang used to enable -fdollars-in-identifiers by default for compatibility with GCC. However, this is no longer a conforming extension after WG21 P2558R2 and WG14 N2701.
So this disables the dialect by default, which is a potentially breaking change for users.
Note: some inline assembly constructs may use dollars in identifiers. We cannot enable the dialect mode automatically based on the user passing -fasm-blocks because that flag is implied by -fms-extensions which is enabled by default on Windows, and thus would mean we'd be enabling a non-conforming language extension by default on that platform.
Users impacted by the change should explicitly add -fdollars-in-identifiers to their build scripts.
Partially addresses #128939