Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] Fix underlying type of EmbedExpr #99050

Merged
merged 3 commits into from
Jul 19, 2024
Merged

Conversation

Fznamznon
Copy link
Contributor

This patch makes remaining cases of #embed to emit int type since there is an agreement to do that for C. C++ is being discussed, but in general we don't want to produce different types for C and C++.

This patch makes remaining cases of #embed to emit int type since there is an
agreement to do that for C. C++ is being discussed, but in general we
don't want to produce different types for C and C++.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 16, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 16, 2024

@llvm/pr-subscribers-clang

Author: Mariya Podchishchaeva (Fznamznon)

Changes

This patch makes remaining cases of #embed to emit int type since there is an agreement to do that for C. C++ is being discussed, but in general we don't want to produce different types for C and C++.


Full diff: https://github.com/llvm/llvm-project/pull/99050.diff

3 Files Affected:

  • (modified) clang/lib/AST/Expr.cpp (+1-1)
  • (modified) clang/lib/Sema/SemaInit.cpp (+1-1)
  • (modified) clang/test/Preprocessor/embed_codegen.cpp (+4-4)
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index 6af1b5683e08b..9d5b8167d0ee6 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -2376,7 +2376,7 @@ APValue SourceLocExpr::EvaluateInContext(const ASTContext &Ctx,
 EmbedExpr::EmbedExpr(const ASTContext &Ctx, SourceLocation Loc,
                      EmbedDataStorage *Data, unsigned Begin,
                      unsigned NumOfElements)
-    : Expr(EmbedExprClass, Ctx.UnsignedCharTy, VK_PRValue, OK_Ordinary),
+    : Expr(EmbedExprClass, Ctx.IntTy, VK_PRValue, OK_Ordinary),
       EmbedKeywordLoc(Loc), Ctx(&Ctx), Data(Data), Begin(Begin),
       NumOfElements(NumOfElements) {
   setDependence(ExprDependence::None);
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index a27ed02fc73b8..2c963b28aeb2b 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -2007,7 +2007,7 @@ static bool canInitializeArrayWithEmbedDataString(ArrayRef<Expr *> ExprList,
   if (InitType->isArrayType()) {
     const ArrayType *InitArrayType = InitType->getAsArrayTypeUnsafe();
     QualType InitElementTy = InitArrayType->getElementType();
-    QualType EmbedExprElementTy = EE->getType();
+    QualType EmbedExprElementTy = EE->getDataStringLiteral()->getType();
     const bool TypesMatch =
         Context.typesAreCompatible(InitElementTy, EmbedExprElementTy) ||
         (InitElementTy->isCharType() && EmbedExprElementTy->isCharType());
diff --git a/clang/test/Preprocessor/embed_codegen.cpp b/clang/test/Preprocessor/embed_codegen.cpp
index 201bf300bc669..fdbf4f2aa400f 100644
--- a/clang/test/Preprocessor/embed_codegen.cpp
+++ b/clang/test/Preprocessor/embed_codegen.cpp
@@ -13,9 +13,9 @@ int ca[] = {
 };
 
 // CHECK: %arrayinit.element = getelementptr inbounds i32, ptr %notca, i64 1
-// CHECK: store i8 106, ptr %arrayinit.element, align 4
+// CHECK: store i32 106, ptr %arrayinit.element, align 4
 // CHECK: %arrayinit.element1 = getelementptr inbounds i32, ptr %notca, i64 2
-// CHECK: store i8 107, ptr %arrayinit.element1, align 4
+// CHECK: store i32 107, ptr %arrayinit.element1, align 4
 int notca[] = {
 a
 #embed <jk.txt> prefix(,)
@@ -74,9 +74,9 @@ constexpr struct T t[] = {
 // CHECK:  %arrayinit.element7 = getelementptr inbounds %struct.T, ptr %tnonc, i64 1
 // CHECK:  call void @llvm.memset.p0.i64(ptr align 4 %arrayinit.element7, i8 0, i64 20, i1 false)
 // CHECK:  %arr8 = getelementptr inbounds %struct.T, ptr %arrayinit.element7, i32 0, i32 0
-// CHECK:  store i8 106, ptr %arr8, align 4
+// CHECK:  store i32 106, ptr %arr8, align 4
 // CHECK:  %arrayinit.element9 = getelementptr inbounds i32, ptr %arr8, i64 1
-// CHECK:  store i8 107, ptr %arrayinit.element9, align 4
+// CHECK:  store i32 107, ptr %arrayinit.element9, align 4
 struct T tnonc[] = {
   a, 300, 1, 2, 3
 #embed <jk.txt> prefix(,)

@@ -13,9 +13,9 @@ int ca[] = {
};

// CHECK: %arrayinit.element = getelementptr inbounds i32, ptr %notca, i64 1
// CHECK: store i8 106, ptr %arrayinit.element, align 4
// CHECK: store i32 106, ptr %arrayinit.element, align 4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually shows a bug in CodeGen for #embed, it should have always stored i32 because the target storage is i32. A cast emission is missing in CGExprAgg.cpp. Though I'm not sure it should be included into this PR.
Another problem is that I can't think of a good test case that shows that elements embedded in the initializer list are of int type and not of unsigned char type.

Copy link
Contributor

@cor3ntin cor3ntin Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to write a test case and found an unrelated embed crash https://compiler-explorer.com/z/bMWbGfh79

can we add the test here #97274 (comment) (with sizeof) ? Thanks

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM but I agree with @cor3ntin that we should add the test case from #97274 (comment) explicitly. Some other tests to consider adding at the same time:

static_assert(_Generic(
#embed __FILE__ limit(1)
  , int : 1, default : 0));

static_assert(alignof(typeof(
#embed __FILE__ limit(1)
)) == alignof(int));

printf("%hhu", // Do we get a -Wformat diagnostic about the specifier mismatch?
#embed __FILE__ limit(1)
);

@Fznamznon
Copy link
Contributor Author

I can add test with sizeof for sure, the problem is that it is already working https://godbolt.org/z/K69v8KPYa and we have similar ones in codebase already.
The test I'm struggling with is to check that elements in initializer list that are expanded from embed also have int type.

@AaronBallman
Copy link
Collaborator

I can add test with sizeof for sure, the problem is that it is already working https://godbolt.org/z/K69v8KPYa and we have similar ones in codebase already.

Ah, if we have sufficient coverage already, that's fine. I was just thinking about coverage in general, didn't check to see if it was redundant. :-)

The test I'm struggling with is to check that elements in initializer list that are expanded from embed also have int type.

If you have an embed file with a single byte in it whose value is 0xFF, you could use a test like:

struct S {
  signed char ch;
};

constexpr struct S s = {
#embed "foo"
};

which should give the same behavior as: https://godbolt.org/z/Tno5YPexM

@Fznamznon
Copy link
Contributor Author

Fznamznon commented Jul 18, 2024

If you have an embed file with a single byte in it whose value is 0xFF, you could use a test like:
struct S {
signed char ch;
};
constexpr struct S s = {
#embed "foo"
};
which should give the same behavior as: https://godbolt.org/z/Tno5YPexM

Thanks! That worked.

I didn't add these though

static_assert(_Generic(
#embed __FILE__ limit(1)
  , int : 1, default : 0));

static_assert(alignof(typeof(
#embed __FILE__ limit(1)
)) == alignof(int));

printf("%hhu", // Do we get a -Wformat diagnostic about the specifier mismatch?
#embed __FILE__ limit(1)
);

for several reasons:

  • They don't test this patch as it affects only what happens in an initializer list
  • The one with alignof/typeof asserts for C++ (not for C) and I'd like to make a fix and add this test in a separate PR
  • we don't emit warnings even for printf("%hhu", 300);

@Fznamznon
Copy link
Contributor Author

The one with alignof/typeof asserts for C++ (not for C) and I'd like to make a fix and add this test in a separate PR

Posted #99624 to fix the assertion

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Fznamznon Fznamznon merged commit 7122b70 into llvm:main Jul 19, 2024
7 checks passed
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
This patch makes remaining cases of #embed to emit int type since there
is an agreement to do that for C. C++ is being discussed, but in general
we don't want to produce different types for C and C++.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
This patch makes remaining cases of #embed to emit int type since there
is an agreement to do that for C. C++ is being discussed, but in general
we don't want to produce different types for C and C++.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants