Skip to content

[libc++] Increase atomic_ref's required alignment for small types #99654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dalg24
Copy link
Member

@dalg24 dalg24 commented Jul 19, 2024

Background discussion here #99570 (comment)

Require types that are 1, 2, 4, 8, or 16 bytes in length to be aligned to at least their size to allow them to be used lock-free.

cc @jyknight @ldionne

@dalg24 dalg24 requested a review from a team as a code owner July 19, 2024 14:20
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jul 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2024

@llvm/pr-subscribers-libcxx

Author: Damien L-G (dalg24)

Changes

Background discussion here #99570 (comment)

Require types that are 1, 2, 4, 8, or 16 bytes in length to be aligned to at least their size to allow them to be used lock-free.

cc @jyknight @ldionne


Full diff: https://github.com/llvm/llvm-project/pull/99654.diff

1 Files Affected:

  • (modified) libcxx/include/__atomic/atomic_ref.h (+5-1)
diff --git a/libcxx/include/__atomic/atomic_ref.h b/libcxx/include/__atomic/atomic_ref.h
index 156f1961151c1..49f6982a27f1f 100644
--- a/libcxx/include/__atomic/atomic_ref.h
+++ b/libcxx/include/__atomic/atomic_ref.h
@@ -95,10 +95,14 @@ struct __atomic_ref_base {
 
   friend struct __atomic_waitable_traits<__atomic_ref_base<_Tp>>;
 
+  // require types that are 1, 2, 4, 8, or 16 bytes in length to be aligned to at least their size to allow them to be
+  // used lock-free
+  static constexpr bool __min_alignement = (sizeof(_Tp) & sizeof(_Tp - 1)) || (sizeof(_Tp) > 16) ? 0 : sizeof(_Tp);
+
 public:
   using value_type = _Tp;
 
-  static constexpr size_t required_alignment = alignof(_Tp);
+  static constexpr size_t required_alignment = alignof(_Tp) > __min_alignement ? alignof(_Tp) : __min_alignement;
 
   // The __atomic_always_lock_free builtin takes into account the alignment of the pointer if provided,
   // so we create a fake pointer with a suitable alignment when querying it. Note that we are guaranteed

@jyknight
Copy link
Member

Thanks! LGTM, but wait for ldionne to approve.

@ldionne ldionne added this to the LLVM 19.X Release milestone Jul 23, 2024
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green. Whoever merges, please cherry-pick to release/19.x.

@ldionne
Copy link
Member

ldionne commented Jul 24, 2024

I'd like to better understand the failure mode here when passing a variable that isn't suitably aligned. It seems that with this patch it becomes pretty easy to violate the std::atomic_ref::required_alignment, as can be seen by the failing tests. For example:

struct LLIArr2 { long long int i[2]; }

Just naively creating a LLIArr2 x on the stack and then using it inside an atomic_ref can either work or fail the required_alignment check depending on the address at which the variable landed. We only catch that loudly in the Extensive and Debug hardening modes -- otherwise it's silent.

What happens when the alignment requirement is not satisfied? Is it only that we will incorrectly report that the atomic_ref is always lock free when in reality it uses a lock under the hood due to the incorrect alignment?

@dalg24
Copy link
Member Author

dalg24 commented Jul 24, 2024

What happens when the alignment requirement is not satisfied? Is it only that we will incorrectly report that the atomic_ref is always lock free when in reality it uses a lock under the hood due to the incorrect alignment?

This is also how I understand it.

@jyknight
Copy link
Member

What happens when the alignment requirement is not satisfied? Is it only that we will incorrectly report that the atomic_ref is always lock free when in reality it uses a lock under the hood due to the incorrect alignment?

On common platforms, you may get either silently-non-atomic accesses or a crash.

@jyknight
Copy link
Member

Ugh...actually...lemme take that back -- we aren't using that alignment info, currently. So I am wrong about the current behavior, and you are right.

What I said is what we should want to be the case, but it's not now.

This PR is still an improvement, and makes the ABI change that's required. But the purpose of requiring alignment is to generate efficient accesses, and this doesn't actually get there, yet. We instead generate a libcall (which, if the object is in fact aligned to required_alignment, will be lock-free), but it's still overhead of a should-be unnecessary libcall.

@jyknight
Copy link
Member

OK, to fix it, I think we just need:

  typedef _Tp _Aligned_Tp __attribute__((aligned(required_alignment)))`, 
  _Aligned_Tp* __ptr_;

(moving decls around as needed)

public:
using value_type = _Tp;

static constexpr size_t required_alignment = alignof(_Tp);
static constexpr size_t required_alignment = alignof(_Tp) > __min_alignment ? alignof(_Tp) : __min_alignment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should calculate the alignment using something like this?

template <class T>
constexpr size_t calc_alignment() {
    size_t align = alignof(T);
    while (align <= alignof(max_align_t)) {
        if (__atomic_always_lock_free(sizeof(T), align))
            return align;
        align *= 2;
    }
    return alignof(T);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this, I think this directly translates to what we are actually trying to achieve (which is to calculate the smallest alignment that allows us to be lockfree).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a valid implementation, because it causes the ABI of the type to change depending on the microarchitecture selection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

@ldionne
Copy link
Member

ldionne commented Jul 26, 2024

@dalg24 Can you please rebase this on top of #99570 I just merged and also apply @jyknight 's comment to use __attribute__((aligned(required_alignment)))? This should allow the compiler to assume that the pointer is properly aligned and generate more optimal code.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but now the tests are failing because we pass an insufficiently-aligned object as part of our tests. We might want to use alignas(atomic_ref<T>::required_alignment) on the objects we pass to atomic_ref in the test?

@ldionne ldionne merged commit 59ca618 into llvm:main Aug 1, 2024
52 of 54 checks passed
@ldionne
Copy link
Member

ldionne commented Aug 1, 2024

/cherry-pick 59ca618

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 1, 2024
…vm#99654)

This patch increases the alignment requirement for std::atomic_ref
such that we can guarantee lockfree operations more often. Specifically,
we require types that are 1, 2, 4, 8, or 16 bytes in size to be aligned
to at least their size to be used with std::atomic_ref.

This is the case for most types, however a notable exception is
`long long` on x86, which is 8 bytes in length but has an alignment
of 4.

As a result of this patch, one has to be more careful about the
alignment of objects used with std::atomic_ref. Failure to provide
a properly-aligned object to std::atomic_ref is a precondition
violation and is technically UB. On the flipside, this allows us
to provide an atomic_ref that is actually lockfree more often,
which is an important QOI property.

More information in the discussion at llvm#99570 (comment).

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
(cherry picked from commit 59ca618)
@llvmbot
Copy link
Member

llvmbot commented Aug 1, 2024

/pull-request #101492

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 1, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot1 while building libcxx at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/2411

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[374/378] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o
[375/378] Generating Msan-x86_64-with-call-Test
[376/378] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o
[377/378] Generating Msan-x86_64-Test
[377/378] Running compiler_rt regression tests
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/rtsan/X86_64LinuxConfig' contained no tests
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 10094 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: SanitizerCommon-hwasan-x86_64-Linux :: Posix/fork_threaded.c (10094 of 10094)
******************** TEST 'SanitizerCommon-hwasan-x86_64-Linux :: Posix/fork_threaded.c' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 900 seconds

Command Output (stderr):
--
RUN: at line 1: /b/sanitizer-x86_64-linux/build/build_default/./bin/clang  -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -fsanitize-hwaddress-experimental-aliasing  -m64 -funwind-tables  -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O0 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Posix/fork_threaded.c -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp && env HWASAN_OPTIONS=die_after_fork=0  /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp
+ /b/sanitizer-x86_64-linux/build/build_default/./bin/clang -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -fsanitize-hwaddress-experimental-aliasing -m64 -funwind-tables -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O0 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Posix/fork_threaded.c -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp
+ env HWASAN_OPTIONS=die_after_fork=0 /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp
HWAddressSanitizer:DEADLYSIGNAL
==3185787==ERROR: HWAddressSanitizer: SEGV on unknown address 0x000000000048 (pc 0x624962d971db bp 0x73408e3a0000 sp 0x73408bbfd250 T3185887)
==3185787==The signal is caused by a READ memory access.
==3185787==Hint: address points to the zero page.
==3185889==ERROR: HWAddressSanitizer: tag-mismatch on address 0x581400005140 at pc 0x624962d53e97
READ of size 155 at 0x581400005140 tags: 04/00 (ptr/mem) in thread T2

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

1 warning(s) in tests
Slowest Tests:
--------------------------------------------------------------------------
900.06s: SanitizerCommon-hwasan-x86_64-Linux :: Posix/fork_threaded.c
132.62s: libFuzzer-x86_64-default-Linux :: out-of-process-fuzz.test
125.61s: libFuzzer-x86_64-libcxx-Linux :: out-of-process-fuzz.test
118.20s: libFuzzer-x86_64-static-libcxx-Linux :: out-of-process-fuzz.test
55.17s: ThreadSanitizer-x86_64 :: bench_threads.cpp
44.42s: libFuzzer-i386-libcxx-Linux :: fork_corpus_groups.test
43.39s: libFuzzer-x86_64-default-Linux :: fork_corpus_groups.test
43.22s: libFuzzer-i386-default-Linux :: fork.test
43.01s: libFuzzer-x86_64-default-Linux :: fork.test
42.96s: libFuzzer-i386-static-libcxx-Linux :: fork_corpus_groups.test
42.76s: libFuzzer-i386-libcxx-Linux :: fork.test
42.74s: libFuzzer-i386-default-Linux :: fork_corpus_groups.test
42.54s: libFuzzer-i386-default-Linux :: value-profile-switch.test
41.75s: libFuzzer-i386-static-libcxx-Linux :: fork.test
41.41s: libFuzzer-x86_64-libcxx-Linux :: fork_corpus_groups.test
Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[374/378] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o
[375/378] Generating Msan-x86_64-with-call-Test
[376/378] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o
[377/378] Generating Msan-x86_64-Test
[377/378] Running compiler_rt regression tests
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:276: warning: input '/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/rtsan/X86_64LinuxConfig' contained no tests
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 10094 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: SanitizerCommon-hwasan-x86_64-Linux :: Posix/fork_threaded.c (10094 of 10094)
******************** TEST 'SanitizerCommon-hwasan-x86_64-Linux :: Posix/fork_threaded.c' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 900 seconds

Command Output (stderr):
--
RUN: at line 1: /b/sanitizer-x86_64-linux/build/build_default/./bin/clang  -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -fsanitize-hwaddress-experimental-aliasing  -m64 -funwind-tables  -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O0 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Posix/fork_threaded.c -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp && env HWASAN_OPTIONS=die_after_fork=0  /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp
+ /b/sanitizer-x86_64-linux/build/build_default/./bin/clang -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -fsanitize-hwaddress-experimental-aliasing -m64 -funwind-tables -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O0 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Posix/fork_threaded.c -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp
+ env HWASAN_OPTIONS=die_after_fork=0 /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/hwasan-x86_64-Linux/Posix/Output/fork_threaded.c.tmp
HWAddressSanitizer:DEADLYSIGNAL
==3185787==ERROR: HWAddressSanitizer: SEGV on unknown address 0x000000000048 (pc 0x624962d971db bp 0x73408e3a0000 sp 0x73408bbfd250 T3185887)
==3185787==The signal is caused by a READ memory access.
==3185787==Hint: address points to the zero page.
==3185889==ERROR: HWAddressSanitizer: tag-mismatch on address 0x581400005140 at pc 0x624962d53e97
READ of size 155 at 0x581400005140 tags: 04/00 (ptr/mem) in thread T2

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

1 warning(s) in tests
Slowest Tests:
--------------------------------------------------------------------------
900.06s: SanitizerCommon-hwasan-x86_64-Linux :: Posix/fork_threaded.c
132.62s: libFuzzer-x86_64-default-Linux :: out-of-process-fuzz.test
125.61s: libFuzzer-x86_64-libcxx-Linux :: out-of-process-fuzz.test
118.20s: libFuzzer-x86_64-static-libcxx-Linux :: out-of-process-fuzz.test
55.17s: ThreadSanitizer-x86_64 :: bench_threads.cpp
44.42s: libFuzzer-i386-libcxx-Linux :: fork_corpus_groups.test
43.39s: libFuzzer-x86_64-default-Linux :: fork_corpus_groups.test
43.22s: libFuzzer-i386-default-Linux :: fork.test
43.01s: libFuzzer-x86_64-default-Linux :: fork.test
42.96s: libFuzzer-i386-static-libcxx-Linux :: fork_corpus_groups.test
42.76s: libFuzzer-i386-libcxx-Linux :: fork.test
42.74s: libFuzzer-i386-default-Linux :: fork_corpus_groups.test
42.54s: libFuzzer-i386-default-Linux :: value-profile-switch.test
41.75s: libFuzzer-i386-static-libcxx-Linux :: fork.test
41.41s: libFuzzer-x86_64-libcxx-Linux :: fork_corpus_groups.test

@dalg24 dalg24 deleted the increase_atomic_ref_required_alignment_of_small_types branch August 1, 2024 16:47
tru pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 2, 2024
…vm#99654)

This patch increases the alignment requirement for std::atomic_ref
such that we can guarantee lockfree operations more often. Specifically,
we require types that are 1, 2, 4, 8, or 16 bytes in size to be aligned
to at least their size to be used with std::atomic_ref.

This is the case for most types, however a notable exception is
`long long` on x86, which is 8 bytes in length but has an alignment
of 4.

As a result of this patch, one has to be more careful about the
alignment of objects used with std::atomic_ref. Failure to provide
a properly-aligned object to std::atomic_ref is a precondition
violation and is technically UB. On the flipside, this allows us
to provide an atomic_ref that is actually lockfree more often,
which is an important QOI property.

More information in the discussion at llvm#99570 (comment).

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
(cherry picked from commit 59ca618)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
Development

Successfully merging this pull request may close these issues.

6 participants