Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C23] #embed bytes with negative signed char values are wrapped around to unsigned int when output with -E #102798

Open
MitalAshok opened this issue Aug 11, 2024 · 4 comments
Labels
c23 clang:frontend Language frontend issues, e.g. anything involving "Sema" confirmed Verified by a second party

Comments

@MitalAshok
Copy link
Contributor

For example:

https://godbolt.org/z/8Tf5ncYWf

//é
constexpr char t[] = {
#embed __FILE__
};

Compiled with -std=c23 -E outputs:

# 1 "/app/example.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 432 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "/app/example.c" 2

constexpr char t[] = {47, 47, 4294967235, 4294967209, 10, 99, 111, 110, <<snipped>> };

Which doesn't compile when fed back into the compiler

@dtcxzyw dtcxzyw added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed new issue labels Aug 11, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 11, 2024

@llvm/issue-subscribers-clang-frontend

Author: Mital Ashok (MitalAshok)

For example:

https://godbolt.org/z/8Tf5ncYWf

//é
constexpr char t[] = {
#embed __FILE__
};

Compiled with -std=c23 -E outputs:

# 1 "/app/example.c"
# 1 "&lt;built-in&gt;" 1
# 1 "&lt;built-in&gt;" 3
# 432 "&lt;built-in&gt;" 3
# 1 "&lt;command line&gt;" 1
# 1 "&lt;built-in&gt;" 2
# 1 "/app/example.c" 2

constexpr char t[] = {47, 47, 4294967235, 4294967209, 10, 99, 111, 110, &lt;&lt;snipped&gt;&gt; };

Which doesn't compile when fed back into the compiler

@zygoloid
Copy link
Collaborator

Should -funsigned-char affect the preprocessed output for this case?

@Fznamznon
Copy link
Contributor

Hmm, since the intention now is to yield int (not unsigned int) values by #embed, I think this is a bug.
I think this cast might be guilty

*Callbacks->OS << static_cast<unsigned>(*Iter);

@AaronBallman AaronBallman added the confirmed Verified by a second party label Aug 12, 2024
@AaronBallman
Copy link
Collaborator

AaronBallman commented Aug 12, 2024

Should -funsigned-char affect the preprocessed output for this case?

The standard wording says:

The values of the integer constant expressions in the expanded sequence are determined by an
implementation-defined mapping of the resource’s data. Each integer constant expression’s value is
in the range from 0 to (2^embed element width) − 1, inclusive. If:
— the list of integer constant expressions is used to initialize an array of a type compatible with
unsigned char, or compatible with char if char cannot hold negative values; and,
— the embed element width is equal to CHAR_BIT (5.2.5.3.2),
then the contents of the initialized elements of the array are as-if the resource’s binary data is fread
(7.23.8.1) into the array at translation time.

With an attached footnote:

For example, an embed element width of 8 will yield a range of values from 0 to 255, inclusive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c23 clang:frontend Language frontend issues, e.g. anything involving "Sema" confirmed Verified by a second party
Projects
None yet
Development

No branches or pull requests

7 participants