Skip to content

[libc++][string] Fix unnecessary padding for basic_string's __short #135973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

spaits
Copy link
Contributor

@spaits spaits commented Apr 16, 2025

Originally, padding in the struct __short was done with arrays, but when no padding was required, we got undefined behavior due to a zero-length array. C++17 11.3.4 Arrays p1:

If the <<length of array>> is present, it shall be a converted constant
expression of type std::size_t and its value shall be greater than zero.

A fix was proposed in commit d95597d, which worked around the undefined behavior by wrapping the padding array in a struct if there was padding required and omitting the array altogether, with the help of a specialization, when the padding is zero.

This change doesn't just work around the problem but changes the semantics of the code, introducing an issue.

Contrary to C, in C++ empty structs and classes must have a size greater than zero. C++ 17 12 Classes p4:

Complete objects and member subobjects of class type shall have nonzero size.

Here are the examples on simplified examples:
Empty struct size: https://godbolt.org/z/f7rbz1n7a
Empty array size: https://godbolt.org/z/1PEsorh66

So whenever there is no padding required we insert an empty struct-sized padding, which is 1 byte on X86_64 (but we won't feel it there, since on that platform padding is required if I am correct. The original issue occurred on a downstream target).

A standard compliant solution, that keeps to the original behavior could be done using unnamed 0 width bit-fields.

12.2.4 Bit-fields p2:

As a special case, an unnamed bit-field with a width of zero
specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed
bit-field may the value of the constant-expression be equal to zero.

@spaits spaits requested a review from a team as a code owner April 16, 2025 15:03
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Apr 16, 2025
@spaits spaits changed the title Fix unnecessary padding for basic_string's __short [libc++][string] Fix unnecessary padding for basic_string's __short Apr 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2025

@llvm/pr-subscribers-libcxx

Author: Gábor Spaits (spaits)

Changes

Originally, padding in the struct __short was done with arrays, but when no padding was required, we got undefined behavior due to a zero-length array. C++17 11.3.4 Arrays p1:

If the &lt;&lt;length of array&gt;&gt; is present, it shall be a converted constant
expression of type std::size_t and its value shall be greater than zero.

A fix was proposed in commit d95597d, which worked around the undefined behavior by wrapping the padding array in a struct if there was padding required and omitting the array altogether, with the help of a specialization, when the padding is zero.

This change doesn't just work around the problem but changes the semantics of the code, introducing an issue.

Contrary to C, in C++ empty structs and classes must have a size greater than zero. C++ 17 12 Classes p4:

Complete objects and member subobjects of class type shall have nonzero size.

So whenever there is no padding required we insert an empty struct-sized padding, which is 1 byte on X86_64 (but we won't feel it there, since on that platform padding is required if I am correct).

A standard compliant solution, that keeps to the original behavior could be done using unnamed 0 width bit-fields.

12.2.4 Bit-fields p2:

As a special case, an unnamed bit-field with a width of zero
specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed
bit-field may the value of the constant-expression be equal to zero.

Full diff: https://github.com/llvm/llvm-project/pull/135973.diff

1 Files Affected:

  • (modified) libcxx/include/string (+2-10)
diff --git a/libcxx/include/string b/libcxx/include/string
index e7e541e31432d..c82cb5074e5d9 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -715,14 +715,6 @@ struct __can_be_converted_to_string_view
 struct __uninitialized_size_tag {};
 struct __init_with_sentinel_tag {};
 
-template <size_t _PaddingSize>
-struct __padding {
-  char __padding_[_PaddingSize];
-};
-
-template <>
-struct __padding<0> {};
-
 template <class _CharT, class _Traits, class _Allocator>
 class basic_string {
 public:
@@ -827,7 +819,7 @@ private:
 
   struct __short {
     value_type __data_[__min_cap];
-    _LIBCPP_NO_UNIQUE_ADDRESS __padding<sizeof(value_type) - 1> __padding_;
+    value_type : 0;
     unsigned char __size_    : 7;
     unsigned char __is_long_ : 1;
   };
@@ -879,7 +871,7 @@ private:
       unsigned char __is_long_ : 1;
       unsigned char __size_    : 7;
     };
-    _LIBCPP_NO_UNIQUE_ADDRESS __padding<sizeof(value_type) - 1> __padding_;
+    value_type : 0;
     value_type __data_[__min_cap];
   };
 

Copy link

github-actions bot commented Apr 16, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions  -- libcxx/include/string
View the diff from clang-format here.
diff --git a/libcxx/include/string b/libcxx/include/string
index 8f168572d..f51f1680f 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -821,7 +821,7 @@ private:
     value_type __data_[__min_cap];
     unsigned char __size_    : 7;
     unsigned char __is_long_ : 1;
-    value_type : 0;
+    value_type               : 0;
   };
 
   // The __endian_factor is required because the field we use to store the size

@philnik777
Copy link
Contributor

Your explanation is incorrect. The empty struct has a siezeof 1, but the datasize is zero, so there is no space used up. Could you provide the case that broke for you?

@spaits
Copy link
Contributor Author

spaits commented Apr 16, 2025

@philnik777 The build failed on my downstream, target with exactly the same reason as the current CI:
https://buildkite.com/llvm-project/libcxx-ci/builds/42032#01963f22-4987-4bec-87d9-413732948d0a AIX (32 bit) target.

When I reverted the change d95597d the assertion failure has disappeared. Also I looked up how empty structs are handled and then I saw, that they can not be empty on C++, so I assumed, the reason for the failure was that change.

@spaits
Copy link
Contributor Author

spaits commented Apr 16, 2025

Also, when I put an empty struct into another struct, it still increment the size and has it's own address: https://godbolt.org/z/dYovsM71b . (Or I don't really get what you mean by datasize there. I assumed "as a member of other struct". I would be grateful if you could point me to an explanation if I am wrong.)

@philnik777
Copy link
Contributor

What is the target you're compiling for?

@spaits
Copy link
Contributor Author

spaits commented Apr 16, 2025

It is TriCore.

Originally, padding in the struct `__short` was done with arrays, but when no padding was required, we got undefined behavior due to a zero-length array.
C++17 11.3.4 Arrays p1:
```
If the <<length of array>> is present, it shall be a converted constant
expression of type std::size_t and its value shall be greater than zero.
```
A fix was proposed in commit d95597d, which worked around the undefined behavior by wrapping the padding array in a struct if there was padding required and omitting the array altogether, with the help of a specialization, when the padding is zero.

This change doesn't just work around the problem but changes the semantics of the code, introducing an issue.

Contrary to C, in C++ empty structs and classes must have a size greater than zero.
C++ 17 12 Classes p4:
```
Complete objects and member subobjects of class type shall have nonzero size.
```
So whenever there is no padding required we insert an empty struct-sized padding, which is 1 byte on X86_64 (but we won't feel it there, since on that platform padding is required if I am correct).

I would like to propose another standard compliant solution using unnamed 0 width bitfields.
12.2.4 Bit-fields p2:
```
As a special case, an unnamed bit-field with a width of zero
specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed
bit-field may the value of the constant-expression be equal to zero.
```
@spaits spaits force-pushed the libcxx_basic_string_padding branch from 810d575 to 7e348e5 Compare April 17, 2025 11:57
@spaits spaits marked this pull request as draft April 17, 2025 11:58
@philnik777
Copy link
Contributor

Also, when I put an empty struct into another struct, it still increment the size and has it's own address: https://godbolt.org/z/dYovsM71b . (Or I don't really get what you mean by datasize there. I assumed "as a member of other struct". I would be grateful if you could point me to an explanation if I am wrong.)

Sorry, I didn't see this before. The datasize is a bit more complicated than that. You need to use [[no_unique_address]] or the EBO to make use of the fact that the datasize is sometimes smaller than the sozeof the struct.

I guess you're using a proprietary compiler with libc++? Just FYI, that's not officially supported. What I think is going on is that your compiler doesn't support [[no_unique_address]] for some reason and that's causing the size change. I would recommend compiling with warnings in system headers enabled and check whether there are any warnings about that. If that's the case you can't upgrade libc++ until the attribute is supported, since we rely on the attribute in a lot of places now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants