Skip to content

[libc++] Start implementing std::datapar::simd #139919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

philnik777
Copy link
Contributor

@philnik777 philnik777 commented May 14, 2025

This patch starts implementing P1928R15 and P3287R3.

@philnik777 philnik777 requested a review from a team as a code owner May 14, 2025 14:53
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label May 14, 2025
@llvmbot
Copy link
Member

llvmbot commented May 14, 2025

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes

Patch is 117.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139919.diff

52 Files Affected:

  • (modified) libcxx/include/CMakeLists.txt (+7)
  • (added) libcxx/include/__simd/abi.h (+165)
  • (added) libcxx/include/__simd/basic_simd.h (+350)
  • (added) libcxx/include/__simd/basic_simd_mask.h (+141)
  • (added) libcxx/include/__simd/simd_flags.h (+106)
  • (added) libcxx/include/__type_traits/pack_utils.h (+30)
  • (added) libcxx/include/__type_traits/standard_types.h (+88)
  • (added) libcxx/include/simd (+24)
  • (added) libcxx/test/libcxx/numerics/simd/implementation_defined_conversions.pass.cpp (+42)
  • (added) libcxx/test/std/numerics/simd/simd.class/aliases.compile.pass.cpp (+47)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/add.pass.cpp (+44)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/bitand.pass.cpp (+54)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/bitor.pass.cpp (+54)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_left.pass.cpp (+54)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_right.pass.cpp (+56)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/divide.pass.cpp (+44)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/modulo.pass.cpp (+53)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/multiply.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/subtract.pass.cpp (+44)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.binary/xor.pass.cpp (+54)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/add.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitand.pass.cpp (+55)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitor.pass.cpp (+55)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_left.pass.cpp (+55)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_right.pass.cpp (+57)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/divide.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/modulo.pass.cpp (+55)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/multiply.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/subtract.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.cassign/xor.pass.cpp (+54)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.comparison/equality.pass.cpp (+66)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.comparison/ordering.pass.cpp (+77)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.ctor/broadcast.pass.cpp (+91)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.ctor/range.mask.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.ctor/range.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/identity.pass.cpp (+43)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/invert.pass.cpp (+53)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/negation.pass.cpp (+44)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/not.pass.cpp (+45)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/postdec.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/postinc.pass.cpp (+46)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/predec.pass.cpp (+45)
  • (added) libcxx/test/std/numerics/simd/simd.class/simd.unary/preinc.pass.cpp (+45)
  • (added) libcxx/test/std/numerics/simd/simd.class/subscript.assert.pass.cpp (+34)
  • (added) libcxx/test/std/numerics/simd/simd.class/traits.compile.pass.cpp (+39)
  • (added) libcxx/test/std/numerics/simd/simd.flags.compile.pass.cpp (+52)
  • (added) libcxx/test/std/numerics/simd/simd.mask.class/simd.mask.ctor/broadcast.pass.cpp (+44)
  • (added) libcxx/test/std/numerics/simd/simd.mask.class/subscript.assert.pass.cpp (+34)
  • (added) libcxx/test/std/numerics/simd/simd.mask.class/traits.compile.pass.cpp (+37)
  • (added) libcxx/test/std/numerics/simd/simd.mask.reductions.pass.cpp (+248)
  • (added) libcxx/test/std/numerics/simd/utils.h (+47)
  • (modified) libcxx/test/support/type_algorithms.h (+15-16)
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 255a0474c0f6b..f6fcfa3fa8ed2 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -714,6 +714,10 @@ set(files
   __ranges/view_interface.h
   __ranges/views.h
   __ranges/zip_view.h
+  __simd/abi.h
+  __simd/basic_simd.h
+  __simd/basic_simd_mask.h
+  __simd/simd_flags.h
   __split_buffer
   __std_mbstate_t.h
   __stop_token/atomic_unique_lock.h
@@ -863,6 +867,7 @@ set(files
   __type_traits/maybe_const.h
   __type_traits/nat.h
   __type_traits/negation.h
+  __type_traits/pack_utils.h
   __type_traits/promote.h
   __type_traits/rank.h
   __type_traits/remove_all_extents.h
@@ -875,6 +880,7 @@ set(files
   __type_traits/remove_reference.h
   __type_traits/remove_volatile.h
   __type_traits/result_of.h
+  __type_traits/standard_types.h
   __type_traits/strip_signature.h
   __type_traits/type_identity.h
   __type_traits/type_list.h
@@ -1029,6 +1035,7 @@ set(files
   semaphore
   set
   shared_mutex
+  simd
   source_location
   span
   sstream
diff --git a/libcxx/include/__simd/abi.h b/libcxx/include/__simd/abi.h
new file mode 100644
index 0000000000000..7f54b02a05de8
--- /dev/null
+++ b/libcxx/include/__simd/abi.h
@@ -0,0 +1,165 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___SIMD_ABI_H
+#define _LIBCPP___SIMD_ABI_H
+
+#include <__concepts/convertible_to.h>
+#include <__concepts/equality_comparable.h>
+#include <__config>
+#include <__cstddef/size_t.h>
+#include <__type_traits/standard_types.h>
+#include <__utility/integer_sequence.h>
+#include <cstdint>
+
+#if _LIBCPP_STD_VER >= 26
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+namespace datapar {
+
+template <class _Tp>
+inline constexpr bool __is_vectorizable_type_v = __is_standard_integer_type_v<_Tp> || __is_character_type_v<_Tp>;
+
+template <>
+inline constexpr bool __is_vectorizable_type_v<float> = true;
+
+template <>
+inline constexpr bool __is_vectorizable_type_v<double> = true;
+
+template <class _From, class _To>
+concept __value_preserving_convertible = requires(_From __from) { _To{__from}; };
+
+template <class _Tp>
+concept __constexpr_wrapper_like =
+    convertible_to<_Tp, decltype(_Tp::value)> && equality_comparable_with<_Tp, decltype(_Tp::value)> &&
+    bool_constant<_Tp() == _Tp::value>::value &&
+    bool_constant<static_cast<decltype(_Tp::value)>(_Tp()) == _Tp::value>::value;
+
+// [simd.expos]
+using __simd_size_type = int;
+
+template <class _Tp>
+struct __deduce_abi;
+
+template <class _Tp, __simd_size_type _Np>
+  requires __is_vectorizable_type_v<_Tp> && (_Np <= 64)
+using __deduce_abi_t = __deduce_abi<_Tp>::template __apply<_Np>;
+
+template <class _Tp>
+using __native_abi = __deduce_abi<_Tp>::template __apply<4>;
+
+template <class _Tp, class _Abi>
+inline constexpr __simd_size_type __simd_size_v = 0;
+
+template <size_t>
+struct __integer_from_impl;
+
+template <>
+struct __integer_from_impl<1> {
+  using type = uint8_t;
+};
+
+template <>
+struct __integer_from_impl<2> {
+  using type = uint16_t;
+};
+
+template <>
+struct __integer_from_impl<4> {
+  using type = uint32_t;
+};
+
+template <>
+struct __integer_from_impl<8> {
+  using type = uint64_t;
+};
+
+template <size_t _Bytes>
+using __integer_from = __integer_from_impl<_Bytes>::type;
+
+// ABI Types
+
+template <class _Tp, __simd_size_type _Np>
+struct __vector_size_abi {
+  using _SimdT [[__gnu__::__vector_size__(_Np * sizeof(_Tp))]] = _Tp;
+  using _MaskT [[__gnu__::__vector_size__(_Np * sizeof(_Tp))]] = __integer_from<sizeof(_Tp)>;
+
+  _LIBCPP_ALWAYS_INLINE constexpr _SimdT __select(_MaskT __mask, _SimdT __true, _SimdT __false) {
+    return __mask ? __true : __false;
+  }
+
+#  ifdef _LIBCPP_COMPILER_CLANG_BASED
+  using _BoolVec __attribute__((__ext_vector_type__(_Np))) = bool;
+
+  static constexpr auto __int_size = _Np <= 8 ? 8 : _Np <= 16 ? 16 : _Np <= 32 ? 32 : 64;
+  static_assert(__int_size >= _Np);
+
+  using _IntSizeBoolVec __attribute__((__ext_vector_type__(__int_size))) = bool;
+
+  _LIBCPP_ALWAYS_INLINE static constexpr auto __mask_to_int(_BoolVec __mask) noexcept {
+    return [&]<size_t... _Origs, size_t... _Fillers>(index_sequence<_Origs...>, index_sequence<_Fillers...>)
+               _LIBCPP_ALWAYS_INLINE {
+                 auto __vec = __builtin_convertvector(
+                     __builtin_shufflevector(__mask, _BoolVec{}, _Origs..., ((void)_Fillers, _Np)...), _IntSizeBoolVec);
+                 if constexpr (_Np <= 8)
+                   return __builtin_bit_cast(unsigned char, __vec);
+                 else if constexpr (_Np <= 16)
+                   return __builtin_bit_cast(unsigned short, __vec);
+                 else if constexpr (_Np <= 32)
+                   return __builtin_bit_cast(unsigned int, __vec);
+                 else
+                   return __builtin_bit_cast(unsigned long long, __vec);
+               }(make_index_sequence<_Np>{}, make_index_sequence<__int_size - _Np>{});
+  }
+
+  _LIBCPP_ALWAYS_INLINE static constexpr bool __any_of(_MaskT __mask) noexcept {
+    return __builtin_reduce_or(__builtin_convertvector(__mask, _BoolVec));
+  }
+
+  _LIBCPP_ALWAYS_INLINE static constexpr bool __all_of(_MaskT __mask) noexcept {
+    return __builtin_reduce_and(__builtin_convertvector(__mask, _BoolVec));
+  }
+
+  _LIBCPP_ALWAYS_INLINE static constexpr __simd_size_type __reduce_count(_MaskT __mask) noexcept {
+    return __builtin_reduce_add(__builtin_convertvector(__builtin_convertvector(__mask, _BoolVec), _MaskT));
+  }
+
+  _LIBCPP_ALWAYS_INLINE static constexpr __simd_size_type __reduce_min_index(_MaskT __mask) noexcept {
+    return __builtin_ctzg(__mask_to_int(__builtin_convertvector(__mask, _BoolVec)));
+  }
+
+  _LIBCPP_ALWAYS_INLINE static constexpr __simd_size_type __reduce_max_index(_MaskT __mask) noexcept {
+    return __int_size - 1 - __builtin_clzg(__mask_to_int(__builtin_convertvector(__mask, _BoolVec)));
+  }
+#  else
+  _LIBCPP_ALWAYS_INLINE constexpr bool __any_of(_MaskT __mask) noexcept {
+    for (size_t __i = 0; __i != _Np; ++__i) {
+      if (__mask[__i])
+        return true;
+    }
+    return false;
+  }
+#  endif
+};
+
+template <class _Tp>
+  requires __is_vectorizable_type_v<_Tp>
+struct __deduce_abi<_Tp> {
+  template <__simd_size_type _Np>
+  using __apply = __vector_size_abi<_Tp, _Np>;
+};
+
+template <class _Tp, __simd_size_type _Np>
+inline constexpr __simd_size_type __simd_size_v<_Tp, __vector_size_abi<_Tp, _Np>> = _Np;
+
+} // namespace datapar
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // _LIBCPP_STD_VER >= 26
+
+#endif // _LIBCPP___SIMD_ABI_H
diff --git a/libcxx/include/__simd/basic_simd.h b/libcxx/include/__simd/basic_simd.h
new file mode 100644
index 0000000000000..acffa012d9ba1
--- /dev/null
+++ b/libcxx/include/__simd/basic_simd.h
@@ -0,0 +1,350 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___SIMD_BASIC_SIMD_H
+#define _LIBCPP___SIMD_BASIC_SIMD_H
+
+#include <__assert>
+#include <__concepts/convertible_to.h>
+#include <__config>
+#include <__memory/assume_aligned.h>
+#include <__ranges/concepts.h>
+#include <__simd/abi.h>
+#include <__simd/basic_simd_mask.h>
+#include <__simd/simd_flags.h>
+#include <__type_traits/is_arithmetic.h>
+#include <__type_traits/pack_utils.h>
+#include <__type_traits/remove_const.h>
+#include <__type_traits/remove_cvref.h>
+#include <__utility/integer_sequence.h>
+
+#if _LIBCPP_STD_VER >= 26
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+namespace datapar {
+
+_LIBCPP_DIAGNOSTIC_PUSH
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Wpsabi")
+template <class _Tp, class _Abi = __native_abi<_Tp>>
+class basic_simd {
+public:
+  using value_type = _Tp;
+  using mask_type  = basic_simd_mask<sizeof(_Tp), _Abi>;
+  using abi_type   = _Abi;
+
+private:
+  using __data_t = abi_type::_SimdT;
+
+  __data_t __data_;
+
+  _LIBCPP_ALWAYS_INLINE static constexpr __data_t __broadcast(value_type __value) {
+    return [&]<size_t... _Indices>(index_sequence<_Indices...>) _LIBCPP_ALWAYS_INLINE noexcept {
+      return __data_t{((void)_Indices, __value)...};
+    }(make_index_sequence<size()>{});
+  }
+
+  template <class _Up>
+  _LIBCPP_ALWAYS_INLINE static constexpr __data_t __load_from_pointer(const _Up* __ptr) {
+    return [&]<size_t... _Indices>(index_sequence<_Indices...>) _LIBCPP_ALWAYS_INLINE noexcept {
+      return __data_t{__ptr[_Indices]...};
+    }(make_index_sequence<size()>{});
+  }
+
+public:
+  static constexpr integral_constant<__simd_size_type, __simd_size_v<value_type, abi_type>> size{};
+
+  constexpr basic_simd() noexcept = default;
+
+  // [simd.ctor]
+  template <convertible_to<value_type> _Up, class _From = remove_cvref_t<_Up>>
+    requires(__value_preserving_convertible<_From, value_type> ||
+             (!is_arithmetic_v<_From> && !__constexpr_wrapper_like<_From>) ||
+             (__constexpr_wrapper_like<_From> && is_arithmetic_v<remove_const_t<decltype(_From::value)>> &&
+              bool_constant<(static_cast<value_type>(_From::value) == _From::value)>::value))
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd(_Up&& __value) noexcept : __data_{__broadcast(__value)} {}
+
+  // TODO: converting constructor
+  // TODO: generator constructor
+  // TODO: flag constructor
+  // TODO: mask flag constructortrue
+
+  template <ranges::contiguous_range _Range, class... _Flags>
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd(_Range&& __range, simd_flags<_Flags...> = {}) noexcept
+    requires(ranges::size(__range) == size())
+  {
+    static_assert(__is_vectorizable_type_v<ranges::range_value_t<_Range>>, "Range has to be of a vectorizable type");
+    static_assert(__contains_type_v<__type_list<_Flags...>, __convert_flag> ||
+                      __value_preserving_convertible<ranges::range_value_t<_Range>, value_type>,
+                  "implicit conversion is not value preserving - consider using std::datapar::simd_flag_convert");
+    auto* __ptr = std::assume_aligned<__get_align_for<value_type, _Flags...>>(std::to_address(ranges::begin(__range)));
+    __data_     = __load_from_pointer(__ptr);
+  }
+
+  template <ranges::contiguous_range _Range, class... _Flags>
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd(
+      _Range&& __range, const mask_type& __mask, simd_flags<_Flags...> = {}) noexcept
+    requires(ranges::size(__range) == size())
+  {
+    static_assert(__is_vectorizable_type_v<ranges::range_value_t<_Range>>, "Range has to be of a vectorizable type");
+    static_assert(__contains_type_v<__type_list<_Flags...>, __convert_flag> ||
+                      __value_preserving_convertible<ranges::range_value_t<_Range>, value_type>,
+                  "implicit conversion is not value preserving - consider using std::datapar::simd_flag_convert");
+    auto* __ptr = std::assume_aligned<__get_align_for<value_type, _Flags...>>(std::to_address(ranges::begin(__range)));
+    __data_     = abi_type::__select(__mask.__data_, __load_from_pointer(__ptr), __broadcast(0));
+  }
+
+  // libc++ extensions
+  _LIBCPP_ALWAYS_INLINE constexpr explicit basic_simd(__data_t __data) noexcept : __data_(__data) {}
+
+  // [simd.subscr]
+  _LIBCPP_HIDE_FROM_ABI constexpr value_type operator[](__simd_size_type __index) const noexcept {
+    _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__index >= 0 && __index < size(), "simd::operator[] out of bounds");
+    return __data_[__index];
+  }
+
+  // [simd.unary]
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd& operator++() noexcept
+    requires requires(value_type __v) { ++__v; }
+  {
+    __data_ += 1;
+    return *this;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd operator++(int) noexcept
+    requires requires(value_type __v) { __v++; }
+  {
+    auto __ret = *this;
+    ++*this;
+    return __ret;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd& operator--() noexcept
+    requires requires(value_type __v) { --__v; }
+  {
+    __data_ -= 1;
+    return *this;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd operator--(int) noexcept
+    requires requires(value_type __v) { __v--; }
+  {
+    auto __ret = *this;
+    --*this;
+    return __ret;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr mask_type operator!() const noexcept
+    requires requires(value_type __v) { !__v; }
+  {
+    return mask_type(!__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd operator~() const noexcept
+    requires requires(value_type __v) { ~__v; }
+  {
+    return basic_simd(~__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd operator+() const noexcept
+    requires requires(value_type __v) { +__v; }
+  {
+    return basic_simd(+__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI constexpr basic_simd operator-() const noexcept
+    requires requires(value_type __v) { -__v; }
+  {
+    return basic_simd(-__data_);
+  }
+
+  // [simd.binary]
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator+(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v + __v; }
+  {
+    return basic_simd(__lhs.__data_ + __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator-(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v - __v; }
+  {
+    return basic_simd(__lhs.__data_ - __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator*(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v * __v; }
+  {
+    return basic_simd(__lhs.__data_ * __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator/(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v / __v; }
+  {
+    return basic_simd(__lhs.__data_ / __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator%(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v % __v; }
+  {
+    return basic_simd(__lhs.__data_ % __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator&(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v & __v; }
+  {
+    return basic_simd(__lhs.__data_ & __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator|(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v | __v; }
+  {
+    return basic_simd(__lhs.__data_ | __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator^(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v ^ __v; }
+  {
+    return basic_simd(__lhs.__data_ ^ __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd
+  operator<<(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v << __v; }
+  {
+    return basic_simd(__lhs.__data_ << __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd
+  operator>>(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v << __v; }
+  {
+    return basic_simd(__lhs.__data_ >> __rhs.__data_);
+  }
+
+  // [simd.cassign]
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator+=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v += __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ + __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator-=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v -= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ - __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator*=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v *= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ * __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator/=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v /= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ / __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator%=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v %= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ % __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator&=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v &= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ & __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator|=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v |= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ | __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator^=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v ^= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ ^ __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator<<=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v <<= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ << __rhs.__data_;
+    return __lhs;
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd& operator>>=(basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v >>= __v; }
+  {
+    __lhs.__data_ = __lhs.__data_ >> __rhs.__data_;
+    return __lhs;
+  }
+
+  // [simd.comparisons]
+  _LIBCPP_HIDE_FROM_ABI friend constexpr mask_type operator==(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v == __v; }
+  {
+    return mask_type(__lhs.__data_ == __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr mask_type operator!=(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v != __v; }
+  {
+    return mask_type(!(__lhs.__data_ == __rhs.__data_));
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr mask_type operator<(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v < __v; }
+  {
+    return mask_type(__lhs.__data_ < __rhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr mask_type operator>=(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v >= __v; }
+  {
+    return mask_type(__rhs.__data_ <= __lhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr mask_type operator>(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v > __v; }
+  {
+    return mask_type(__rhs.__data_ < __lhs.__data_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI friend constexpr mask_type operator<=(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
+    requires requires(value_type __v) { __v <= __v; }
+  {
+    return mask_type(__lhs.__data_ <= __rhs.__data_);
+  }
+};
+_LIBCPP_DIAGNOSTIC_POP
+
+template <class _Tp, __simd_size_type _Np = __simd_size_v<_Tp, __native_abi<_Tp>>>
+using simd = basic_simd<_Tp, __deduce_abi_t<_Tp, _Np>>;
+
+} // namespace datapar
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // _LIBCPP_STD_VER >= 26
+
+#endif // _LIBCPP___SIMD_BASIC_SIMD_H
diff --git a/libcxx/include/__simd/basic_simd_mask.h b/libcxx/include/__simd/basic_simd_mask.h
new file mode 100644
index 0000000000000..b2f93d1c9705b
--- /dev/null
+++ b/libcxx/include/__simd/basic_simd_mask.h
@@ -0,0 +1,141 @@
+//===------...
[truncated]

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions ,cpp,h -- libcxx/include/__simd/abi.h libcxx/include/__simd/basic_simd.h libcxx/include/__simd/basic_simd_mask.h libcxx/include/__simd/simd_flags.h libcxx/include/__type_traits/pack_utils.h libcxx/include/__type_traits/standard_types.h libcxx/include/simd libcxx/test/libcxx/numerics/simd/implementation_defined_conversions.pass.cpp libcxx/test/std/numerics/simd/simd.class/aliases.compile.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/add.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/bitand.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/bitor.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_left.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_right.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/divide.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/modulo.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/multiply.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/subtract.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.binary/xor.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/add.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitand.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitor.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_left.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_right.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/divide.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/modulo.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/multiply.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/subtract.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.cassign/xor.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.comparison/equality.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.comparison/ordering.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.ctor/broadcast.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.ctor/range.mask.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.ctor/range.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/identity.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/invert.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/negation.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/not.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/postdec.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/postinc.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/predec.pass.cpp libcxx/test/std/numerics/simd/simd.class/simd.unary/preinc.pass.cpp libcxx/test/std/numerics/simd/simd.class/subscript.assert.pass.cpp libcxx/test/std/numerics/simd/simd.class/traits.compile.pass.cpp libcxx/test/std/numerics/simd/simd.flags.compile.pass.cpp libcxx/test/std/numerics/simd/simd.mask.class/simd.mask.ctor/broadcast.pass.cpp libcxx/test/std/numerics/simd/simd.mask.class/subscript.assert.pass.cpp libcxx/test/std/numerics/simd/simd.mask.class/traits.compile.pass.cpp libcxx/test/std/numerics/simd/simd.mask.reductions.pass.cpp libcxx/test/std/numerics/simd/utils.h libcxx/test/support/type_algorithms.h
View the diff from clang-format here.
diff --git a/libcxx/include/__simd/basic_simd.h b/libcxx/include/__simd/basic_simd.h
index acffa012d..d0897ad63 100644
--- a/libcxx/include/__simd/basic_simd.h
+++ b/libcxx/include/__simd/basic_simd.h
@@ -179,7 +179,7 @@ public:
   }
 
   _LIBCPP_HIDE_FROM_ABI friend constexpr basic_simd operator*(const basic_simd& __lhs, const basic_simd& __rhs) noexcept
-    requires requires(value_type __v) { __v * __v; }
+    requires requires(value_type __v) { __v* __v; }
   {
     return basic_simd(__lhs.__data_ * __rhs.__data_);
   }
diff --git a/libcxx/include/__type_traits/standard_types.h b/libcxx/include/__type_traits/standard_types.h
index f94599259..35d195b39 100644
--- a/libcxx/include/__type_traits/standard_types.h
+++ b/libcxx/include/__type_traits/standard_types.h
@@ -70,10 +70,10 @@ inline constexpr bool __is_character_type_v<char> = true;
 template <>
 inline constexpr bool __is_character_type_v<wchar_t> = true;
 
-#if _LIBCPP_HAS_CHAR8_T
+#  if _LIBCPP_HAS_CHAR8_T
 template <>
 inline constexpr bool __is_character_type_v<char8_t> = true;
-#endif
+#  endif
 
 template <>
 inline constexpr bool __is_character_type_v<char16_t> = true;
diff --git a/libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_left.pass.cpp b/libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_left.pass.cpp
index 2236ebdcc..aaa4060e8 100644
--- a/libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_left.pass.cpp
+++ b/libcxx/test/std/numerics/simd/simd.class/simd.binary/bitshift_left.pass.cpp
@@ -40,7 +40,9 @@ constexpr bool test() {
   static_assert(has_bitshift_left<dp::simd<int>>);
 
   types::for_each(types::vectorizable_float_types{}, []<class T> {
-    simd_utils::test_sizes([]<int N>(std::integral_constant<int, N>) { static_assert(!has_bitshift_left<dp::simd<T, N>>); });
+    simd_utils::test_sizes([]<int N>(std::integral_constant<int, N>) {
+      static_assert(!has_bitshift_left<dp::simd<T, N>>);
+    });
   });
 
   return true;
diff --git a/libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_left.pass.cpp b/libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_left.pass.cpp
index b254a8d2f..0e588ce50 100644
--- a/libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_left.pass.cpp
+++ b/libcxx/test/std/numerics/simd/simd.class/simd.cassign/bitshift_left.pass.cpp
@@ -41,7 +41,9 @@ constexpr bool test() {
   static_assert(has_bitshift_left<dp::simd<int>>);
 
   types::for_each(types::vectorizable_float_types{}, []<class T> {
-    simd_utils::test_sizes([]<int N>(std::integral_constant<int, N>) { static_assert(!has_bitshift_left<dp::simd<T, N>>); });
+    simd_utils::test_sizes([]<int N>(std::integral_constant<int, N>) {
+      static_assert(!has_bitshift_left<dp::simd<T, N>>);
+    });
   });
 
   return true;
diff --git a/libcxx/test/std/numerics/simd/simd.class/simd.comparison/ordering.pass.cpp b/libcxx/test/std/numerics/simd/simd.class/simd.comparison/ordering.pass.cpp
index 69b5bcd07..b19032068 100644
--- a/libcxx/test/std/numerics/simd/simd.class/simd.comparison/ordering.pass.cpp
+++ b/libcxx/test/std/numerics/simd/simd.class/simd.comparison/ordering.pass.cpp
@@ -58,8 +58,8 @@ constexpr bool test() {
   });
   types::for_each(types::vectorizable_float_types{}, []<class T> {
     constexpr auto nan = std::numeric_limits<T>::quiet_NaN();
-    dp::simd<T, 4> a = std::array<T, 4>{nan, nan, nan, nan};
-    dp::simd<T, 4> b = a;
+    dp::simd<T, 4> a   = std::array<T, 4>{nan, nan, nan, nan};
+    dp::simd<T, 4> b   = a;
     assert(dp::none_of(a < b));
     assert(dp::none_of(a > b));
     assert(dp::none_of(a <= b));
diff --git a/libcxx/test/std/numerics/simd/simd.flags.compile.pass.cpp b/libcxx/test/std/numerics/simd/simd.flags.compile.pass.cpp
index 097903cd4..97e533f68 100644
--- a/libcxx/test/std/numerics/simd/simd.flags.compile.pass.cpp
+++ b/libcxx/test/std/numerics/simd/simd.flags.compile.pass.cpp
@@ -48,5 +48,7 @@ static_assert(test<convert_flag_t>(dp::simd_flags<>{}, dp::simd_flags<convert_fl
 static_assert(test<convert_flag_t>(dp::simd_flags<convert_flag_t>{}, dp::simd_flags<convert_flag_t>{}));
 static_assert(test<overaligned_flag_t<1>>(dp::simd_flags<overaligned_flag_t<1>>{}, dp::simd_flags<>{}));
 static_assert(test<overaligned_flag_t<1>>(dp::simd_flags<>{}, dp::simd_flags<overaligned_flag_t<1>>{}));
-static_assert(test<overaligned_flag_t<1>>(dp::simd_flags<overaligned_flag_t<1>>{}, dp::simd_flags<overaligned_flag_t<1>>{}));
-static_assert(test<overaligned_flag_t<16>>(dp::simd_flags<overaligned_flag_t<16>>{}, dp::simd_flags<overaligned_flag_t<1>>{}));
+static_assert(test<overaligned_flag_t<1>>(dp::simd_flags<overaligned_flag_t<1>>{},
+                                          dp::simd_flags<overaligned_flag_t<1>>{}));
+static_assert(test<overaligned_flag_t<16>>(dp::simd_flags<overaligned_flag_t<16>>{},
+                                           dp::simd_flags<overaligned_flag_t<1>>{}));

using __deduce_abi_t = __deduce_abi<_Tp>::template __apply<_Np>;

template <class _Tp>
using __native_abi = __deduce_abi<_Tp>::template __apply<4>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be hardcoded to 4?


template <class _Tp, __simd_size_type _Np>
struct __vector_size_abi {
using _SimdT [[__gnu__::__vector_size__(_Np * sizeof(_Tp))]] = _Tp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to use the functionality provided by the compiler-provided SIMD type (e.g. the operators), however this makes it unclear/implicit what API is required from a _SimdT. So for example, if we were to implement a specialization of __vector_size_abi for complex types which requires a custom struct _SimdT, it wouldn't be obvious what API that _SimdT should have. I think it would be acceptable to solve that problem by simply documenting the API that we're expecting and to ensure that we don't use any other properties beyond that from basic_simd.h & friends.


_LIBCPP_BEGIN_NAMESPACE_STD

template <class _Tp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we can't use the normal type traits like is_integral & friends? That should be documented here in a comment.

Comment on lines +70 to +71
template <>
inline constexpr bool __is_character_type_v<wchar_t> = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_LIBCPP_HAS_WIDE_CHARACTERS?

inline constexpr bool __is_vectorizable_type_v<double> = true;

template <class _From, class _To>
concept __value_preserving_convertible = requires(_From __from) { _To{__from}; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this at least requires:

requires (integral<_From> && integral<_To>) || (floating_point<_From> && floating_point<_To>)

Or something similar, maybe __is_vectorizable_type instead. But either way this needs a requirement to make it clear that this isn't a very general utility despite its name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. int32_t -> float64_t is value-preserving, as is int16_t -> float32_t. However e.g. int32_t -> uint32_t is not value-preserving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you folks have an implementation in mind for the notion of being value-preserving? From https://eel.is/c++draft/simd#general-8:

The conversion from an arithmetic type U to a vectorizable type T is value-preserving if all possible values of U can be represented with type T.

Is there a clever implementation trick to do this, or do we need to list them manually, or is there something logical that we can do and we're not seeing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEEE floating point numbers of type T~ can exactly represent integers up to +/-(1 << std::numeric_limits::digits()`).

So something like

template <typename From, typename To>
concept value_preserving = std::integral<From> ? 
               (std::floating_point<To> ? std::numeric_limits<To>::digits() >= 
                                          8*sizeof(From) - std::signed_integral<From> :
           /* handle conversion to integer */     ) : 
           /* handle conversion from non-integer */

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just numeric_limits<To>::digits >= numeric_limits<From>::digits?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a much better way. Just add an extra check to make sure that is_signed_v<To> || !is_signed_v<From>.

Are these supposed to work with <stdfloat>? If so, we also want To's exponent to be at least as large. In particular, float16 has more digits than bfloat16, but a smaller exponent, so neither can represent all values from the other.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any trick. This is what I implemented:

  template <typename _From, typename _To>
    concept __arithmetic_only_value_preserving_convertible_to
      = convertible_to<_From, _To> and __arithmetic<_From> and __arithmetic<_To>
          and not (is_signed_v<_From> and is_unsigned_v<_To>)
          and numeric_limits<_From>::digits <= numeric_limits<_To>::digits
          and numeric_limits<_From>::max() <= numeric_limits<_To>::max()
          and numeric_limits<_From>::lowest() >= numeric_limits<_To>::lowest();

  template <typename _From, typename _To>
    concept __value_preserving_convertible_to
      = __arithmetic_only_value_preserving_convertible_to<_From, _To>
          or (__complex_like<_To> and __arithmetic_only_value_preserving_convertible_to<
                                        _From, typename _To::value_type>);

I think bfloat16_t and float16_t need another check besides only for digits. bfloat16_t is not vectorizable yet, but it's arithmetic at least. The max and lowest compares are mixed-type comparisons but I so far I have convinced myself that even with implicit conversions this concept always gives the right answer.

struct __deduce_abi;

template <class _Tp, __simd_size_type _Np>
requires __is_vectorizable_type_v<_Tp> && (_Np <= 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this 64 value should be a static constexpr with at least a comment explaining why we chose that value over something else. It also needs to be documented in the documentation for implementation-defined values.

If the rationale is as simple as "this is for MSVC compatibility", that's fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that simd<int, N> always is a complete type, independent of N. You can only turn it into a disabled specialization if N is "too large". (64 is the minimum required by the spec, you're free to support higher width)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philnik777 I think we're missing the whole https://eel.is/c++draft/simd#overview-1 in the current state of the patch. IIUC, we'd need something like:

// disabled "specialization"
template <class T, class ABI>
class basic_simd {
  basic_simd() = delete;
  basic_simd(basic_simd const&) = delete;
  basic_simd& operator=(basic_simd const&) = delete;
  ~basic_simd() = delete;

  using value_type = T;
  using abi_type = ABI;
  using mask_type = ???;
};

// enabled specialization
template <class T, class ABI>
  requires __is_vectorizable_type<T> && (N-from-deduce-abi-t <= 64)
class basic_simd {
  // what you have right now
};

@mattkretz Is there an intent that users can specialize basic_simd for their own ABI types? If not, @philnik777 we should mark it as [[no_specializations]].

@philnik777 Either way, this would need a test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no intent for other ABI tags than what the implementation defines. But I'm not sure why we should ban partial specializations of basic_simd for other non-impl-defined ABI tags. Seems like a potentially useful hack that shouldn't cost us any extra work, no?

using __simd_size_type = int;

template <class _Tp>
struct __deduce_abi;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest forward-declaring __vector_size_abi and defining __deduce_abi right here instead of the other way around. Since __deduce_abi is a simple metafunction, it feels kinda weird to have a forward declaration for it and to require jumping around.

(!is_arithmetic_v<_From> && !__constexpr_wrapper_like<_From>) ||
(__constexpr_wrapper_like<_From> && is_arithmetic_v<remove_const_t<decltype(_From::value)>> &&
bool_constant<(static_cast<value_type>(_From::value) == _From::value)>::value))
_LIBCPP_HIDE_FROM_ABI constexpr basic_simd(_Up&& __value) noexcept : __data_{__broadcast(__value)} {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://eel.is/c++draft/simd#ctor-4 says there's a conditional explicit here. Can you double-check all the other constructors for this mistake, and also ensure that we've got tests for those?

Actually, it looks like you got your constraints wrong. You seem to have taken the explicit(see-below) for the Constraints clauses.

}

// libc++ extensions
_LIBCPP_ALWAYS_INLINE constexpr explicit basic_simd(__data_t __data) noexcept : __data_(__data) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a libc++ specific test validating this extension? We need one.

}

template <class _Up>
_LIBCPP_ALWAYS_INLINE static constexpr __data_t __load_from_pointer(const _Up* __ptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use _LIBCPP_ALWAYS_INLINE in a few places. Based on our discussion just now, this is for the (quite interesting) reason that you want to avoid ABI differences based on compiler flags since these functions traffic in SIMD types. I'd like to introduce a macro like #define _LIBCPP_SIMD_SENSITIVE_ABI _LIBCPP_ALWAYS_INLINE (strawman name) that we would use on any function whose ABI is sensitive to SIMD-related compiler flags. This is also where we should document the reason for that function and its mode of operation (i.e. that _LIBCPP_ALWAYS_INLINE solves the problem of having mismatched function call ABIs by removing the existence of a function call in the first place).

This should also probably come with a clang-tidy check to ensure that we don't mess up. Otherwise, people could end up with a very very confusing ODR violation.


// TODO: generating constructor

// libc++ extension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we have tests for those.

Comment on lines +95 to +96
static_assert(__contains_type_v<__type_list<_Flags...>, __convert_flag> ||
__value_preserving_convertible<ranges::range_value_t<_Range>, value_type>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert(__contains_type_v<__type_list<_Flags...>, __convert_flag> ||
__value_preserving_convertible<ranges::range_value_t<_Range>, value_type>,
static_assert(__value_preserving_convertible<ranges::range_value_t<_Range>, value_type> || __contains_type_v<__type_list<_Flags...>, __convert_flag>,

This feels a lot more natural to read.

@ldionne ldionne requested a review from joy2myself May 30, 2025 15:50
@ldionne
Copy link
Member

ldionne commented May 30, 2025

@joy2myself Please don't hesitate to review this since you have domain expertise!

@mattkretz
Copy link

mattkretz commented Jun 2, 2025 via email

Comment on lines +150 to +155
template <class _Tp>
requires __is_vectorizable_type_v<_Tp>
struct __deduce_abi<_Tp> {
template <__simd_size_type _Np>
using __apply = __vector_size_abi<_Tp, _Np>;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing another specialization, but this will lead to ABI troubles. Consider simd<int, 8>. As defined you'll get the ABI of a [[__gnu__::__vector_size__(8 * sizeof(int))]] type. But that changes ABI depending on AVX compiler flags (two XMM registers vs. one YMM register). The intent of ABI tags is to have a different basic_simd specialization for simd<int, 8> depending on -mavx2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the heads up -- this is an interesting comment and something we missed. Nikolas was handling some of these calling convention differences via _LIBCPP_ALWAYS_INLINE, see this comment, but I think neither of us realized that they would impact any type that also contains one of these [[__gnu__::__vector_size__(8 * sizeof(int))]] types. So it looks like we need to somehow encode the ABI differences in this type, and I think we'll be able to drop all of the uses of _LIBCPP_ALWAYS_INLINE as a result.

I might suggest something like this:

enum class _SimdAbiAffectingFlags {
  _None = 1,
  _AVX2 = 1 << 2,
  _Whatever = 1 << 3,
  // etc...
};

consteval _SimdAbiAffectingFlags _CurrentFlags = _SimdAbiAffectingFlags::_None
#ifdef __AVX2__
  | _SimdAbiAffectingFlags::_AVX2
#endif
#ifdef __WHATEVER__
  | _SimdAbiAffectingFlags::_Whatever
#endif
;

and then we pass _CurrentFlags into __vector_size_abi. This is pseudo-code but I guess something like that would work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The representation of masks affects ABI (vector masks vs. bit masks). Also consider AVX w/o AVX2: you might want to change the ABI of fp simd but not for integral simd.

I also chose the approach that at some point my ABI tags become public API. So even with -mavx2 the user can explicitly instantiate basic_simd with the SSE ABI. This can be useful for I/O or interfacing with shared libraries.

With Clang you have the simplification that __vector_size__ supports any multiple of the value type's sizeof. GCC only supports powers of 2. That informed my ABI choices. (I don't want to require 64 Bytes for a simd<int, 12> — with native 128-bit vectors.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and the ODR issues are more fine-grained than the ABI issues. E.g. SSE2, SSE3, SSSE3, SSE4a, XOP, SSE4.1, SSE4.2 all use the same ABI, but code-gen for inline functions can be different. Most importantly running SSE4 code on an SSE3 systems is a SIGILL. Expect users to compile their TUs multiple times with different compiler flags, link them into one binary and then dispatch according to CPUID. 💣

inline constexpr bool __is_vectorizable_type_v<double> = true;

template <class _From, class _To>
concept __value_preserving_convertible = requires(_From __from) { _To{__from}; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you folks have an implementation in mind for the notion of being value-preserving? From https://eel.is/c++draft/simd#general-8:

The conversion from an arithmetic type U to a vectorizable type T is value-preserving if all possible values of U can be represented with type T.

Is there a clever implementation trick to do this, or do we need to list them manually, or is there something logical that we can do and we're not seeing?

struct __deduce_abi;

template <class _Tp, __simd_size_type _Np>
requires __is_vectorizable_type_v<_Tp> && (_Np <= 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philnik777 I think we're missing the whole https://eel.is/c++draft/simd#overview-1 in the current state of the patch. IIUC, we'd need something like:

// disabled "specialization"
template <class T, class ABI>
class basic_simd {
  basic_simd() = delete;
  basic_simd(basic_simd const&) = delete;
  basic_simd& operator=(basic_simd const&) = delete;
  ~basic_simd() = delete;

  using value_type = T;
  using abi_type = ABI;
  using mask_type = ???;
};

// enabled specialization
template <class T, class ABI>
  requires __is_vectorizable_type<T> && (N-from-deduce-abi-t <= 64)
class basic_simd {
  // what you have right now
};

@mattkretz Is there an intent that users can specialize basic_simd for their own ABI types? If not, @philnik777 we should mark it as [[no_specializations]].

@philnik777 Either way, this would need a test.

Comment on lines +150 to +155
template <class _Tp>
requires __is_vectorizable_type_v<_Tp>
struct __deduce_abi<_Tp> {
template <__simd_size_type _Np>
using __apply = __vector_size_abi<_Tp, _Np>;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the heads up -- this is an interesting comment and something we missed. Nikolas was handling some of these calling convention differences via _LIBCPP_ALWAYS_INLINE, see this comment, but I think neither of us realized that they would impact any type that also contains one of these [[__gnu__::__vector_size__(8 * sizeof(int))]] types. So it looks like we need to somehow encode the ABI differences in this type, and I think we'll be able to drop all of the uses of _LIBCPP_ALWAYS_INLINE as a result.

I might suggest something like this:

enum class _SimdAbiAffectingFlags {
  _None = 1,
  _AVX2 = 1 << 2,
  _Whatever = 1 << 3,
  // etc...
};

consteval _SimdAbiAffectingFlags _CurrentFlags = _SimdAbiAffectingFlags::_None
#ifdef __AVX2__
  | _SimdAbiAffectingFlags::_AVX2
#endif
#ifdef __WHATEVER__
  | _SimdAbiAffectingFlags::_Whatever
#endif
;

and then we pass _CurrentFlags into __vector_size_abi. This is pseudo-code but I guess something like that would work.

Comment on lines +9 to +11
// <simd>

// REQUIRES: std-at-least-c++26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to describe what is being tested in this file, i.e. what implementation-defined constructor.


// <simd>

// Test hardening assertions for std::datapar::simd.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is actually not hardened in the Standard. We should move it to libcxx/test/libcxx. If the spec said Hardened Precondition, then it would be a hardened one.

@@ -0,0 +1,34 @@
//===----------------------------------------------------------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached to this file: let's add a test for operator[] even if it looks silly/redundant, just for completeness. And in that test we can validate details like the fact that the method is const-qualified, etc.

_LIBCPP_ALWAYS_INLINE constexpr explicit basic_simd(__data_t __data) noexcept : __data_(__data) {}

// [simd.subscr]
_LIBCPP_HIDE_FROM_ABI constexpr value_type operator[](__simd_size_type __index) const noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// noexcept strenghtened (we do this in e.g. expected and I think it's a good idea)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file should be named unary_plus.pass.cpp -- identity.pass.cpp seems unusual and not super obvious.

std::array<T, N> arr;
std::iota(std::begin(arr), std::end(arr), 0);
const dp::simd<T, N> vec(arr); // make sure operator+ is const
std::same_as<dp::simd<T, N>> auto&& ret = +vec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::same_as<dp::simd<T, N>> auto&& ret = +vec;
std::same_as<dp::simd<T, N>> decltype(auto) ret = +vec;

Or else, is there a reason for using auto&&?

Comment on lines +122 to +132
template <size_t _Bytes, class _Abi>
_LIBCPP_HIDE_FROM_ABI constexpr __simd_size_type
reduce_min_index(const basic_simd_mask<_Bytes, _Abi>& __mask) noexcept {
return _Abi::__reduce_min_index(__mask.__data_);
}

template <size_t _Bytes, class _Abi>
_LIBCPP_HIDE_FROM_ABI constexpr __simd_size_type
reduce_max_index(const basic_simd_mask<_Bytes, _Abi>& __mask) noexcept {
return _Abi::__reduce_max_index(__mask.__data_);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the preconditions "any_of(k) is true." missing here?

And we may also need negative tests for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants