Skip to content

[clang][CompundLiteralExpr] Don't defer evaluation for CLEs #137163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kadircet
Copy link
Member

@kadircet kadircet commented Apr 24, 2025

Previously we would defer evaluation of CLEs until LValue to RValue
conversions, which would result in creating values within wrong scope
and triggering use-after-frees.

This patch instead eagerly evaluates CLEs, within the scope requiring
them. This requires storing an extra pointer for CLE expressions with
static storage.

Fixes #137165

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 24, 2025

@llvm/pr-subscribers-clang

Author: kadir çetinkaya (kadircet)

Changes

Previously we would defer evaluation of CLEs until LValue to RValue
conversions, which would result in creating values within wrong scope
and triggering use-after-frees.

This patch instead eagerly evaluates CLEs, within the scope requiring
them. This requires storing an extra pointer for CLE expressions with
static storage.


Full diff: https://github.com/llvm/llvm-project/pull/137163.diff

4 Files Affected:

  • (modified) clang/include/clang/AST/Expr.h (+12)
  • (modified) clang/lib/AST/Expr.cpp (+9)
  • (modified) clang/lib/AST/ExprConstant.cpp (+25-8)
  • (added) clang/test/AST/static-compound-literals.cpp (+12)
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index a83320a7ddec2..95c0f910c22f8 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -3489,6 +3489,11 @@ class CompoundLiteralExpr : public Expr {
   /// The int part of the pair stores whether this expr is file scope.
   llvm::PointerIntPair<TypeSourceInfo *, 1, bool> TInfoAndScope;
   Stmt *Init;
+
+  /// Value of constant literals with static storage duration. Used only for
+  /// constant folding as CompoundLiteralExpr is not an ICE.
+  mutable APValue *StaticValue = nullptr;
+
 public:
   CompoundLiteralExpr(SourceLocation lparenloc, TypeSourceInfo *tinfo,
                       QualType T, ExprValueKind VK, Expr *init, bool fileScope)
@@ -3518,6 +3523,13 @@ class CompoundLiteralExpr : public Expr {
     TInfoAndScope.setPointer(tinfo);
   }
 
+  bool hasStaticStorage() const { return isFileScope() && isGLValue(); }
+  APValue *getOrCreateStaticValue(ASTContext& Ctx) const;
+  APValue &getStaticValue() const {
+    assert(StaticValue);
+    return *StaticValue;
+  }
+
   SourceLocation getBeginLoc() const LLVM_READONLY {
     // FIXME: Init should never be null.
     if (!Init)
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 59c0e47c7c195..442e85b892a51 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -5467,3 +5467,12 @@ ConvertVectorExpr *ConvertVectorExpr::Create(
   return new (Mem) ConvertVectorExpr(SrcExpr, TI, DstType, VK, OK, BuiltinLoc,
                                      RParenLoc, FPFeatures);
 }
+
+APValue *CompoundLiteralExpr::getOrCreateStaticValue(ASTContext &Ctx) const {
+  assert(hasStaticStorage());
+  if (!StaticValue) {
+    StaticValue = new (Ctx) APValue;
+    Ctx.addDestruction(StaticValue);
+  }
+  return StaticValue;
+}
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 7c933f47bf7f0..2379e78c1631a 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -4596,10 +4596,6 @@ handleLValueToRValueConversion(EvalInfo &Info, const Expr *Conv, QualType Type,
         return false;
       }
 
-      APValue Lit;
-      if (!Evaluate(Lit, Info, CLE->getInitializer()))
-        return false;
-
       // According to GCC info page:
       //
       // 6.28 Compound Literals
@@ -4622,7 +4618,12 @@ handleLValueToRValueConversion(EvalInfo &Info, const Expr *Conv, QualType Type,
         }
       }
 
-      CompleteObject LitObj(LVal.Base, &Lit, Base->getType());
+      APValue *Lit =
+          CLE->hasStaticStorage()
+              ? &CLE->getStaticValue()
+              : Info.CurrentCall->getTemporary(Base, LVal.Base.getVersion());
+
+      CompleteObject LitObj(LVal.Base, Lit, Base->getType());
       return extractSubobject(Info, Conv, LitObj, LVal.Designator, RVal, AK);
     } else if (isa<StringLiteral>(Base) || isa<PredefinedExpr>(Base)) {
       // Special-case character extraction so we don't have to construct an
@@ -9125,9 +9126,25 @@ bool
 LValueExprEvaluator::VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
   assert((!Info.getLangOpts().CPlusPlus || E->isFileScope()) &&
          "lvalue compound literal in c++?");
-  // Defer visiting the literal until the lvalue-to-rvalue conversion. We can
-  // only see this when folding in C, so there's no standard to follow here.
-  return Success(E);
+  APValue *Lit;
+  // If CompountLiteral has static storage, its value can be used outside
+  // this expression. So evaluate it once and store it in ASTContext.
+  if (E->hasStaticStorage()) {
+    Lit = E->getOrCreateStaticValue(Info.Ctx);
+    Result.set(E);
+    // Reset any previously evaluated state, otherwise evaluation below might
+    // fail.
+    // FIXME: Should we just re-use the previously evaluated value instead?
+    *Lit = APValue();
+  } else {
+    Lit = &Info.CurrentCall->createTemporary(E, E->getInitializer()->getType(),
+                                             ScopeKind::FullExpression, Result);
+  }
+  if (!EvaluateInPlace(*Lit, Info, Result, E->getInitializer())) {
+    *Lit = APValue();
+    return false;
+  }
+  return true;
 }
 
 bool LValueExprEvaluator::VisitCXXTypeidExpr(const CXXTypeidExpr *E) {
diff --git a/clang/test/AST/static-compound-literals.cpp b/clang/test/AST/static-compound-literals.cpp
new file mode 100644
index 0000000000000..ceb8b985ab9dc
--- /dev/null
+++ b/clang/test/AST/static-compound-literals.cpp
@@ -0,0 +1,12 @@
+// Test that we can successfully compile this code, especially under ASAN.
+// RUN: %clang_cc1 -verify -std=c++20 -fsyntax-only %s
+// expected-no-diagnostics
+struct Foo {
+  Foo* f;
+  operator bool() const { return true; }
+};
+constexpr Foo f((Foo[]){});
+int foo() {
+  if (Foo(*f.f)) return 1;
+  return 0;
+}

Copy link

github-actions bot commented Apr 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kadircet
Copy link
Member Author

Followup for #118480

@@ -3489,6 +3489,11 @@ class CompoundLiteralExpr : public Expr {
/// The int part of the pair stores whether this expr is file scope.
llvm::PointerIntPair<TypeSourceInfo *, 1, bool> TInfoAndScope;
Stmt *Init;

/// Value of constant literals with static storage duration. Used only for
/// constant folding as CompoundLiteralExpr is not an ICE.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A compound literal expression is an ICE in C though?

An integer constant expression116) shall have integer type and shall only have operands that are
integer constants, named and compound literal constants of integer type, character constants,
sizeof expressions whose results are integer constants, alignof expressions, and floating, named,
or compound literal constants of arithmetic type that are the immediate operands of casts. Cast
operators in an integer constant expression shall only convert arithmetic types to integer types,
except as part of an operand to the typeof operators, sizeof operator, or alignof operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but that's probably from an old comment that confused us (at least me):
LValueExprEvaluator used to say:

  // only see this when folding in C, so there's no standard to follow here

I think we can just remove this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am happy to drop that comment, but I was inclined to preserve it as well since I came across logic aligned with that in couple of other places too, e.g.

case Expr::CompoundLiteralExprClass:
. There's likely more dragons lurking in here.

@@ -3489,6 +3489,11 @@ class CompoundLiteralExpr : public Expr {
/// The int part of the pair stores whether this expr is file scope.
llvm::PointerIntPair<TypeSourceInfo *, 1, bool> TInfoAndScope;
Stmt *Init;

/// Value of constant literals with static storage duration. Used only for
/// constant folding as CompoundLiteralExpr is not an ICE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but that's probably from an old comment that confused us (at least me):
LValueExprEvaluator used to say:

  // only see this when folding in C, so there's no standard to follow here

I think we can just remove this comment.

// Reset any previously evaluated state, otherwise evaluation below might
// fail.
// FIXME: Should we just re-use the previously evaluated value instead?
*Lit = APValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation should produce the same result every time, so it doesn't really do any harm to do it twice. But if you do it right, it's also very easy to avoid evaluating it twice.

I suspect using EvaluateInPlace will cause state to leak into the evaluation when it shouldn't (like, we accidentally accept a use of a temporary inside the initializer).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't fully understand the implications here, I'd rather keep the evaluation behavior similar to the previous version (e.g. evaluate every-time we need it).

Happy to evaluate once if you think that should be safe ~always and feel strongly about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more concerned about the fact that you're using EvaluateInPlace, as opposed to using EvaluateAsInitializer. Which... I'm not sure it's actually possible to observe a difference at the moment due to the rule that static compound literals are required to have a constant initializer, but it's fragile.


On a sort of related note, I was experimenting with some testcases, and the following crashed:

struct A {int x[1]; };
A f();
typedef int *t[];
consteval int* f(int* x) { return x; }
int ** x = (t){f(f().x)};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think of the opposite. The code here mimics what MaterializeTemporaryExpr does and I think it's a good thing because MaterializeTemporaryExpr should be the most battle-tested code for similar cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it seems to EvaluateAsInitializer requires a VarDecl, I am not sure if that makes sense in LValueExprEvaluator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug into the EvalMode thing... I remember why we added it now. See https://reviews.llvm.org/D151587 for context. Roughly, the issue is that, if you constant-evaluate the expression in isolation, you get a different result from what you'd get evaluating the whole expression. So we can't redo the evaluation in ConstantFold mode: if you do, you'll corrupt the precomputed value, for cases like clang/test/CodeGenCXX/const-init-cxx1y.cpp.

I think compound literals run into similar issues if you allow ConstantFold mode to overwrite the evaluated value.

Maybe we can add some workaround specifically for cases where we immediately do an lvalue-to-rvalue conversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late response here, it took me a while to wrap my head around it and then I got interrupted :(

Roughly, the issue is that, if you constant-evaluate the expression in isolation, you get a different result from what you'd get evaluating the whole expression. So we can't redo the evaluation in ConstantFold mode: if you do, you'll corrupt the precomputed value

I am not sure if this applies to CLEs and even if it does I don't think this patch is making it worse:

  • CLEs can only have constant expressions as initializers, as you also highlighted above. Hence (i think) evaluating them multiple times or evaluating sub-parts of them shouldn't matter.
  • As things stand clang is already evaluating CLEs from scratch every time it needs to (so if there's a problem here, it already exists).

I think compound literals run into similar issues if you allow ConstantFold mode to overwrite the evaluated value.

Sorry if I am being dense, but there's a good chance I didn't fully understand your point. So if you think what I said above doesn't make sense, I'd really appreciate if you can provide a code snippet that demonstrates such a case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some experiments. The !CLETy.isConstant(Info.Ctx) check mostly stops the bad interaction with examples like const-init-cxx1y.cpp. If it's not constant, we can't access it at all (unlike lifetime-extended temporaries), and if it is constant it can't be modified. Re-evaluating can change the value in some weird cases, though. For example:

struct RR { int r; };
struct Z { int x; const RR* y; int z; };
constinit Z z = { 10, (const RR[1]){__builtin_constant_p(z.x)}, z.y->r };

The field "r" in the compound literal is initialized to zero, but the read from the field returns 1.

But you're right that it isn't a regression. Maybe sufficient to add a FIXME explaining why it's wrong to evaluate in the current context.


On a related note, the following cases crash, but they're not regressions, I guess.

struct RR { int&& r; };
struct Z { RR* x; };
constinit Z z = { (RR[1]){1} };
struct RR { int r; };
struct Z { int x; const RR* y; int z; };
inline int f() { return 0; }
Z z2 = { 10, (const RR[1]){__builtin_constant_p(z2.x)}, z2.y->r+f() };

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're right that it isn't a regression. Maybe sufficient to add a FIXME explaining why it's wrong to evaluate in the current context.

I added the example as a test case and left a FIXME. but I am not really sure if it's about re-evaluation. I fiddled with the code a little to not re-evaluate CLE's initializer again but use the previous one, it still initializes field to 1 🤷
I'll also create an issue for that, in case someone wants to investigate (FWIW, that behavior is same without this patch).


As for the crashes, again they also happen at HEAD, still added test cases for those too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really about whether we re-evaluate it, it's about whether we evaluate it in the right context. If you implement something like VarDecl::evaluateValue(), it should evaluate to zero.

But I think I'm okay with this as an incremental improvement.

// Reset any previously evaluated state, otherwise evaluation below might
// fail.
// FIXME: Should we just re-use the previously evaluated value instead?
*Lit = APValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more concerned about the fact that you're using EvaluateInPlace, as opposed to using EvaluateAsInitializer. Which... I'm not sure it's actually possible to observe a difference at the moment due to the rule that static compound literals are required to have a constant initializer, but it's fragile.


On a sort of related note, I was experimenting with some testcases, and the following crashed:

struct A {int x[1]; };
A f();
typedef int *t[];
consteval int* f(int* x) { return x; }
int ** x = (t){f(f().x)};

@kadircet kadircet requested a review from efriedma-quic April 28, 2025 15:30
@kadircet
Copy link
Member Author

kadircet commented May 6, 2025

ping @AaronBallman @efriedma-quic if you don't have more concerns here, I'd like to move forward with this patch

kadircet added 4 commits June 26, 2025 10:00
Previously we would defer evaluation of CLEs until LValue to RValue
conversions, which would result in creating values within wrong scope
and triggering use-after-frees.

This patch instead eagerly evaluates CLEs, within the scope requiring
them. This requires storing an extra pointer for CLE expressions with
static storage.
@kadircet kadircet force-pushed the static_compound_lits branch from ee8e0f8 to 6ca1c58 Compare June 27, 2025 09:21
Copy link
Member Author

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review @efriedma-quic! LMK if you have more concerns, otherwise i'd like to move forward with the change.

// Reset any previously evaluated state, otherwise evaluation below might
// fail.
// FIXME: Should we just re-use the previously evaluated value instead?
*Lit = APValue();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're right that it isn't a regression. Maybe sufficient to add a FIXME explaining why it's wrong to evaluate in the current context.

I added the example as a test case and left a FIXME. but I am not really sure if it's about re-evaluation. I fiddled with the code a little to not re-evaluate CLE's initializer again but use the previous one, it still initializes field to 1 🤷
I'll also create an issue for that, in case someone wants to investigate (FWIW, that behavior is same without this patch).


As for the crashes, again they also happen at HEAD, still added test cases for those too.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Reset any previously evaluated state, otherwise evaluation below might
// fail.
// FIXME: Should we just re-use the previously evaluated value instead?
*Lit = APValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really about whether we re-evaluate it, it's about whether we evaluate it in the right context. If you implement something like VarDecl::evaluateValue(), it should evaluate to zero.

But I think I'm okay with this as an incremental improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use-after-free involving CompoundLiteralExprs
5 participants