Skip to content

[C23] Fix typeof handling in enum declarations #146394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AaronBallman
Copy link
Collaborator

We have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example.

The problem this solves is when that construct is ambiguous. Consider:

enum E : typeof(int) { F };

After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider:

sizeof (int){ 0 };

After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case.

The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself.

Fixes #146351

We have a parsing helper function which parses either a parenthesized
expression or a parenthesized type name. This is used when parsing a
unary operator such as sizeof, for example.

The problem this solves is when that construct is ambiguous. Consider:

	enum E : typeof(int) { F };

After we've parsed the 'typeof', what ParseParenExpression() is
responsible for is '(int) { F }' which looks like a compound literal
expression when it's actually the parens and operand for 'typeof'
followed by the enumerator list for the enumeration declaration. Then
consider:

	sizeof (int){ 0 };

After we've parsed 'sizeof', ParseParenExpression is responsible for
parsing something grammatically similar to the problematic case.

The solution is to recognize that the expression form of 'typeof' is
required to have parentheses. So we know the open and close parens that
ParseParenExpression handles must be part of the grammar production for
the operator, not part of the operand expression itself.

Fixes llvm#146351
@AaronBallman AaronBallman added clang Clang issues not falling into any other category c23 clang:frontend Language frontend issues, e.g. anything involving "Sema" accepts-invalid rejects-valid labels Jun 30, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2025

@llvm/pr-subscribers-clang

Author: Aaron Ballman (AaronBallman)

Changes

We have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example.

The problem this solves is when that construct is ambiguous. Consider:

enum E : typeof(int) { F };

After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider:

sizeof (int){ 0 };

After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case.

The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself.

Fixes #146351


Full diff: https://github.com/llvm/llvm-project/pull/146394.diff

5 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/include/clang/Parse/Parser.h (+7-1)
  • (modified) clang/lib/Parse/ParseExpr.cpp (+24-10)
  • (modified) clang/lib/Parse/Parser.cpp (+2-1)
  • (modified) clang/test/Parser/c2x-typeof.c (+10)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 45b5f77545361..1b12061dad85b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -304,6 +304,8 @@ C23 Feature Support
   which clarified how Clang is handling underspecified object declarations.
 - Clang now accepts single variadic parameter in type-name. It's a part of
   `WG14 N2975 <https://open-std.org/JTC1/SC22/WG14/www/docs/n2975.pdf>`_
+- Fixed a bug with handling the type operand form of ``typeof`` when it is used
+  to specify a fixed underlying type for an enumeration. #GH146351
 
 C11 Feature Support
 ^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index a47e23ffbd357..ced9613b9ab9b 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -4184,7 +4184,12 @@ class Parser : public CodeCompletionHandler {
   /// ParseParenExpression - This parses the unit that starts with a '(' token,
   /// based on what is allowed by ExprType.  The actual thing parsed is returned
   /// in ExprType. If stopIfCastExpr is true, it will only return the parsed
-  /// type, not the parsed cast-expression.
+  /// type, not the parsed cast-expression. If ParenKnownToBeNonCast is true,
+  /// the initial open paren and its matching close paren are known to be part
+  /// of another grammar production and not part of the operand. e.g., the
+  /// typeof and typeof_unqual operators in C. If it is false, then the function
+  /// has to parse the parens to determine whether they're part of a cast or
+  /// compound literal expression rather than a parenthesized type.
   ///
   /// \verbatim
   ///       primary-expression: [C99 6.5.1]
@@ -4210,6 +4215,7 @@ class Parser : public CodeCompletionHandler {
   /// \endverbatim
   ExprResult ParseParenExpression(ParenParseOption &ExprType,
                                   bool stopIfCastExpr, bool isTypeCast,
+                                  bool ParenKnownToBeNonCast,
                                   ParsedType &CastTy,
                                   SourceLocation &RParenLoc);
 
diff --git a/clang/lib/Parse/ParseExpr.cpp b/clang/lib/Parse/ParseExpr.cpp
index 3cf3d4ea7d705..c6731fca7bfa7 100644
--- a/clang/lib/Parse/ParseExpr.cpp
+++ b/clang/lib/Parse/ParseExpr.cpp
@@ -758,7 +758,8 @@ ExprResult Parser::ParseCastExpression(CastParseKind ParseKind,
     ParsedType CastTy;
     SourceLocation RParenLoc;
     Res = ParseParenExpression(ParenExprType, false /*stopIfCastExr*/,
-                               isTypeCast == TypeCastState::IsTypeCast, CastTy,
+                               isTypeCast == TypeCastState::IsTypeCast,
+                               false /*ParenKnownToBeNonCast*/, CastTy,
                                RParenLoc);
 
     // FIXME: What should we do if a vector literal is followed by a
@@ -2110,12 +2111,24 @@ Parser::ParseExprAfterUnaryExprOrTypeTrait(const Token &OpTok,
     // If it starts with a '(', we know that it is either a parenthesized
     // type-name, or it is a unary-expression that starts with a compound
     // literal, or starts with a primary-expression that is a parenthesized
-    // expression.
+    // expression. Most unary operators have an expression form without parens
+    // as part of the grammar for the operator, and a type form with the parens
+    // as part of the grammar for the operator. However, typeof and
+    // typeof_unqual require parens for both forms. This means that we *know*
+    // that the open and close parens cannot be part of a cast expression,
+    // which means we definitely are not parsing a compound literal expression.
+    // This disambiguates a case like enum E : typeof(int) { }; where we've
+    // parsed typeof and need to handle the (int){} tokens properly despite
+    // them looking like a compound literal, as in sizeof (int){}; where the
+    // parens could be part of a parenthesized type name or for a cast
+    // expression of some kind.
+    bool ParenKnownToBeNonCast =
+        OpTok.isOneOf(tok::kw_typeof, tok::kw_typeof_unqual);
     ParenParseOption ExprType = ParenParseOption::CastExpr;
     SourceLocation LParenLoc = Tok.getLocation(), RParenLoc;
 
-    Operand = ParseParenExpression(ExprType, true/*stopIfCastExpr*/,
-                                   false, CastTy, RParenLoc);
+    Operand = ParseParenExpression(ExprType, true /*stopIfCastExpr*/, false,
+                                   ParenKnownToBeNonCast, CastTy, RParenLoc);
     CastRange = SourceRange(LParenLoc, RParenLoc);
 
     // If ParseParenExpression parsed a '(typename)' sequence only, then this is
@@ -2589,10 +2602,11 @@ bool Parser::tryParseOpenMPArrayShapingCastPart() {
   return !ErrorFound;
 }
 
-ExprResult
-Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
-                             bool isTypeCast, ParsedType &CastTy,
-                             SourceLocation &RParenLoc) {
+ExprResult Parser::ParseParenExpression(ParenParseOption &ExprType,
+                                        bool stopIfCastExpr, bool isTypeCast,
+                                        bool ParenKnownToBeNonCast,
+                                        ParsedType &CastTy,
+                                        SourceLocation &RParenLoc) {
   assert(Tok.is(tok::l_paren) && "Not a paren expr!");
   ColonProtectionRAIIObject ColonProtection(*this, false);
   BalancedDelimiterTracker T(*this, tok::l_paren);
@@ -2747,7 +2761,7 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
       T.consumeClose();
       ColonProtection.restore();
       RParenLoc = T.getCloseLocation();
-      if (Tok.is(tok::l_brace)) {
+      if (!ParenKnownToBeNonCast && Tok.is(tok::l_brace)) {
         ExprType = ParenParseOption::CompoundLiteral;
         TypeResult Ty;
         {
@@ -2757,7 +2771,7 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
         return ParseCompoundLiteralExpression(Ty.get(), OpenLoc, RParenLoc);
       }
 
-      if (Tok.is(tok::l_paren)) {
+      if (!ParenKnownToBeNonCast && Tok.is(tok::l_paren)) {
         // This could be OpenCL vector Literals
         if (getLangOpts().OpenCL)
         {
diff --git a/clang/lib/Parse/Parser.cpp b/clang/lib/Parse/Parser.cpp
index 18f399aca59e8..2f201cc49cf0b 100644
--- a/clang/lib/Parse/Parser.cpp
+++ b/clang/lib/Parse/Parser.cpp
@@ -1607,7 +1607,8 @@ ExprResult Parser::ParseAsmStringLiteral(bool ForAsmLabel) {
     EnterExpressionEvaluationContext ConstantEvaluated(
         Actions, Sema::ExpressionEvaluationContext::ConstantEvaluated);
     AsmString = ParseParenExpression(ExprType, true /*stopIfCastExpr*/, false,
-                                     CastTy, RParenLoc);
+                                     false /*ParenKnownToBeNonCast*/, CastTy,
+                                     RParenLoc);
     if (!AsmString.isInvalid())
       AsmString = Actions.ActOnConstantExpression(AsmString);
 
diff --git a/clang/test/Parser/c2x-typeof.c b/clang/test/Parser/c2x-typeof.c
index 9c836dfa6d823..245c127d71e03 100644
--- a/clang/test/Parser/c2x-typeof.c
+++ b/clang/test/Parser/c2x-typeof.c
@@ -42,3 +42,13 @@ _Static_assert(__builtin_offsetof(typeof(s), i) == 0);
 _Static_assert(__builtin_offsetof(typeof_unqual(struct S), i) == 0);
 _Static_assert(__builtin_offsetof(typeof_unqual(s), i) == 0);
 
+// Show that typeof and typeof_unqual can be used in the underlying type of an
+// enumeration even when given the type form. Note, this can look like a
+// compound literal expression, which caused GH146351.
+enum E3 : typeof(int) { ThirdZero }; // (int) {}; is not a compound literal!
+enum E4 : typeof_unqual(int) { FourthZero }; // Same here
+
+// Ensure that this invalid construct is diagnosed instead of being treated
+// as typeof((int){ 0 }).
+typeof(int) { 0 } x; // expected-error {{a type specifier is required for all declarations}} \
+                        expected-error {{expected identifier or '('}}

@@ -4210,6 +4215,7 @@ class Parser : public CodeCompletionHandler {
/// \endverbatim
ExprResult ParseParenExpression(ParenParseOption &ExprType,
bool stopIfCastExpr, bool isTypeCast,
bool ParenKnownToBeNonCast,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it might be semantically simpler to invert the value of this and call it just MightBeCast? Reasoning about the fact that passing false here means that ‘it’s not known to be a non-cast’ is a bit headache-inducing at least for me...

Actually on that note, TypeCastState is a thing. Can we just merge this parameter with the isTypeCast one and make it a TypeCastState?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about introducing a new enumeration here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on that note, TypeCastState is a thing. Can we just merge this parameter with the isTypeCast one and make it a TypeCastState?

TypeCastState is cursed in that it doesn't mean what you might think it means: 77e21fc

So it doesn't mean "this is, is not, or maybe a cast expression", it means "if there's a typo, should we allow type names or not?" I verified this is not vestigial code after delayed typo correction was removed; it's still used for typo correction today. We probably should rename some stuff to make this far more clear though, because your suggestion is what I initially tried to implement and broke approximately every test in the suite.

How about introducing a new enumeration here?

Given that there's three bools in a row, yeah, that's not a bad idea.

How about this for a solution:

  1. Rename TypeCastState to TypoCorrectionTypeBehavior
  2. Rename NotTypeCast, IsTypeCast, and MaybeTypeCast to AllowNonTypes, AllowTypes, and AllowBoth`
  3. Add a new enumeration TypeCastState with members IsTypeCast, IsNotTypeCast, and Unknown
  4. Change ParseParenExpression to take both a TypoCorrectionTypeBehavior and a TypeCastState.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution looks good!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it doesn't mean "this is, is not, or maybe a cast expression", it means "if there's a typo, should we allow type names or not?"

Ok, I like to say I’m bad at naming, but that is actually awful. ;Þ

How about this for a solution:

  1. Rename TypeCastState to TypoCorrectionTypeBehavior
  2. Rename NotTypeCast, IsTypeCast, and MaybeTypeCast to AllowNonTypes, AllowTypes, and AllowBoth`
  3. Add a new enumeration TypeCastState with members IsTypeCast, IsNotTypeCast, and Unknown
  4. Change ParseParenExpression to take both a TypoCorrectionTypeBehavior and a TypeCastState.

I was about to suggest this so sgtm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up picking different names for TypeCastState because that was still confusing.

@llvmbot llvmbot added the clang:openmp OpenMP related changes to Clang label Jul 2, 2025
@@ -2850,7 +2870,8 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
isFoldOperator(NextToken().getKind())) {
ExprType = ParenParseOption::FoldExpr;
return ParseFoldExpression(ExprResult(), T);
} else if (isTypeCast) {
} else if (CorrectionBehavior == TypoCorrectionTypeBehavior::AllowTypes) {
// FIXME: This should not be predicated on typo correction behavior.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth noting!

This was an unfortunate surprise. However, it's a preexisting defect that I don't have the bandwidth to correct right now.

The old TypeCastState enumeration was introduced to help with typo correction (77e21fc) and we accidentally ended up using it for parsing decisions:

isTypeCast == TypeCastState::IsTypeCast, CastTy,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not surprised that that happened given the name of the enum...

Copy link

github-actions bot commented Jul 2, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp,c -- clang/include/clang/Parse/Parser.h clang/lib/Parse/ParseExpr.cpp clang/lib/Parse/ParseExprCXX.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Parse/ParseTemplate.cpp clang/lib/Parse/Parser.cpp clang/test/Parser/c2x-typeof.c
View the diff from clang-format here.
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index 647e65aa7..683934321 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -127,7 +127,7 @@ enum class ParenParseOption {
 /// expression, etc.
 enum class ParenExprKind {
   PartOfOperator, // typeof(int)
-  Unknown, // sizeof(int) or sizeof (int)1.0f, or compound literal, etc
+  Unknown,        // sizeof(int) or sizeof (int)1.0f, or compound literal, etc
 };
 
 /// Describes the behavior that should be taken for an __if_exists

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing I thought of: What about C++ edge cases, e.g. both Clang and GCC accept this... which doesn’t quite seem right to me (https://godbolt.org/z/K58eEvG9P):

typeof(int){} x; // Probably parsed as typeof(int{})

/*AllowTypes=*/CorrectionBehavior !=
TypoCorrectionTypeBehavior::AllowNonTypes,
/*AllowNonTypes=*/CorrectionBehavior !=
TypoCorrectionTypeBehavior::AllowTypes);
Validator.IsAddressOfOperand = isAddressOfOperand;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like at this point the CastExpressionIdValidator should just store the TypoCorrectionTypeBehavior instead of splitting it into two flags—or am I missing something here?

@@ -2850,7 +2870,8 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
isFoldOperator(NextToken().getKind())) {
ExprType = ParenParseOption::FoldExpr;
return ParseFoldExpression(ExprResult(), T);
} else if (isTypeCast) {
} else if (CorrectionBehavior == TypoCorrectionTypeBehavior::AllowTypes) {
// FIXME: This should not be predicated on typo correction behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not surprised that that happened given the name of the enum...

@Sirraide
Copy link
Member

Sirraide commented Jul 2, 2025

typeof(int){} x; // Probably parsed as typeof(int{})

Actually, I’m not sure what we think this is supposed to be; I haven’t checked.

@Sirraide
Copy link
Member

Sirraide commented Jul 2, 2025

typeof(int){} x; // Probably parsed as typeof(int{})

Actually, I’m not sure what we think this is supposed to be; I haven’t checked.

Actually, that’s just a compound literal; I forgot we supported those in C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepts-invalid c23 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category rejects-valid
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[c23] typeof(type-name) in enumeration fixed underlying type is broken
4 participants