-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[libc++] Start implementing std::datapar::simd #139919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef _LIBCPP___SIMD_ABI_H | ||
#define _LIBCPP___SIMD_ABI_H | ||
|
||
#include <__concepts/convertible_to.h> | ||
#include <__concepts/equality_comparable.h> | ||
#include <__config> | ||
#include <__cstddef/size_t.h> | ||
#include <__type_traits/standard_types.h> | ||
#include <__utility/integer_sequence.h> | ||
#include <cstdint> | ||
|
||
#if _LIBCPP_STD_VER >= 26 | ||
|
||
_LIBCPP_BEGIN_NAMESPACE_STD | ||
namespace datapar { | ||
|
||
template <class _Tp> | ||
inline constexpr bool __is_vectorizable_type_v = __is_standard_integer_type_v<_Tp> || __is_character_type_v<_Tp>; | ||
|
||
template <> | ||
inline constexpr bool __is_vectorizable_type_v<float> = true; | ||
|
||
template <> | ||
inline constexpr bool __is_vectorizable_type_v<double> = true; | ||
|
||
template <class _From, class _To> | ||
concept __value_preserving_convertible = requires(_From __from) { _To{__from}; }; | ||
|
||
template <class _Tp> | ||
concept __constexpr_wrapper_like = | ||
convertible_to<_Tp, decltype(_Tp::value)> && equality_comparable_with<_Tp, decltype(_Tp::value)> && | ||
bool_constant<_Tp() == _Tp::value>::value && | ||
bool_constant<static_cast<decltype(_Tp::value)>(_Tp()) == _Tp::value>::value; | ||
|
||
// [simd.expos] | ||
using __simd_size_type = int; | ||
|
||
template <class _Tp> | ||
struct __deduce_abi; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest forward-declaring |
||
|
||
template <class _Tp, __simd_size_type _Np> | ||
requires __is_vectorizable_type_v<_Tp> && (_Np <= 64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this 64 value should be a If the rationale is as simple as "this is for MSVC compatibility", that's fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @philnik777 I think we're missing the whole https://eel.is/c++draft/simd#overview-1 in the current state of the patch. IIUC, we'd need something like: // disabled "specialization"
template <class T, class ABI>
class basic_simd {
basic_simd() = delete;
basic_simd(basic_simd const&) = delete;
basic_simd& operator=(basic_simd const&) = delete;
~basic_simd() = delete;
using value_type = T;
using abi_type = ABI;
using mask_type = ???;
};
// enabled specialization
template <class T, class ABI>
requires __is_vectorizable_type<T> && (N-from-deduce-abi-t <= 64)
class basic_simd {
// what you have right now
}; @mattkretz Is there an intent that users can specialize @philnik777 Either way, this would need a test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's no intent for other ABI tags than what the implementation defines. But I'm not sure why we should ban partial specializations of |
||
using __deduce_abi_t = __deduce_abi<_Tp>::template __apply<_Np>; | ||
|
||
template <class _Tp> | ||
using __native_abi = __deduce_abi<_Tp>::template __apply<4>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this really be hardcoded to |
||
|
||
template <class _Tp, class _Abi> | ||
inline constexpr __simd_size_type __simd_size_v = 0; | ||
|
||
template <size_t> | ||
struct __integer_from_impl; | ||
|
||
template <> | ||
struct __integer_from_impl<1> { | ||
using type = uint8_t; | ||
}; | ||
|
||
template <> | ||
struct __integer_from_impl<2> { | ||
using type = uint16_t; | ||
}; | ||
|
||
template <> | ||
struct __integer_from_impl<4> { | ||
using type = uint32_t; | ||
}; | ||
|
||
template <> | ||
struct __integer_from_impl<8> { | ||
using type = uint64_t; | ||
}; | ||
|
||
template <size_t _Bytes> | ||
using __integer_from = __integer_from_impl<_Bytes>::type; | ||
|
||
// ABI Types | ||
|
||
template <class _Tp, __simd_size_type _Np> | ||
struct __vector_size_abi { | ||
ldionne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
using _SimdT [[__gnu__::__vector_size__(_Np * sizeof(_Tp))]] = _Tp; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's fine to use the functionality provided by the compiler-provided SIMD type (e.g. the operators), however this makes it unclear/implicit what API is required from a |
||
using _MaskT [[__gnu__::__vector_size__(_Np * sizeof(_Tp))]] = __integer_from<sizeof(_Tp)>; | ||
|
||
_LIBCPP_ALWAYS_INLINE constexpr _SimdT __select(_MaskT __mask, _SimdT __true, _SimdT __false) { | ||
return __mask ? __true : __false; | ||
} | ||
|
||
# ifdef _LIBCPP_COMPILER_CLANG_BASED | ||
using _BoolVec __attribute__((__ext_vector_type__(_Np))) = bool; | ||
|
||
static constexpr auto __int_size = _Np <= 8 ? 8 : _Np <= 16 ? 16 : _Np <= 32 ? 32 : 64; | ||
static_assert(__int_size >= _Np); | ||
|
||
using _IntSizeBoolVec __attribute__((__ext_vector_type__(__int_size))) = bool; | ||
|
||
_LIBCPP_ALWAYS_INLINE static constexpr auto __mask_to_int(_BoolVec __mask) noexcept { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's make whatever we can private to make it clear what's the intended API of |
||
return [&]<size_t... _Origs, size_t... _Fillers>(index_sequence<_Origs...>, index_sequence<_Fillers...>) | ||
_LIBCPP_ALWAYS_INLINE { | ||
auto __vec = __builtin_convertvector( | ||
__builtin_shufflevector(__mask, _BoolVec{}, _Origs..., ((void)_Fillers, _Np)...), _IntSizeBoolVec); | ||
if constexpr (_Np <= 8) | ||
return __builtin_bit_cast(unsigned char, __vec); | ||
else if constexpr (_Np <= 16) | ||
return __builtin_bit_cast(unsigned short, __vec); | ||
else if constexpr (_Np <= 32) | ||
return __builtin_bit_cast(unsigned int, __vec); | ||
else | ||
return __builtin_bit_cast(unsigned long long, __vec); | ||
}(make_index_sequence<_Np>{}, make_index_sequence<__int_size - _Np>{}); | ||
} | ||
|
||
_LIBCPP_ALWAYS_INLINE static constexpr bool __any_of(_MaskT __mask) noexcept { | ||
return __builtin_reduce_or(__builtin_convertvector(__mask, _BoolVec)); | ||
} | ||
|
||
_LIBCPP_ALWAYS_INLINE static constexpr bool __all_of(_MaskT __mask) noexcept { | ||
return __builtin_reduce_and(__builtin_convertvector(__mask, _BoolVec)); | ||
} | ||
|
||
_LIBCPP_ALWAYS_INLINE static constexpr __simd_size_type __reduce_count(_MaskT __mask) noexcept { | ||
return __builtin_reduce_add(__builtin_convertvector(__builtin_convertvector(__mask, _BoolVec), _MaskT)); | ||
} | ||
|
||
_LIBCPP_ALWAYS_INLINE static constexpr __simd_size_type __reduce_min_index(_MaskT __mask) noexcept { | ||
return __builtin_ctzg(__mask_to_int(__builtin_convertvector(__mask, _BoolVec))); | ||
} | ||
|
||
_LIBCPP_ALWAYS_INLINE static constexpr __simd_size_type __reduce_max_index(_MaskT __mask) noexcept { | ||
return __int_size - 1 - __builtin_clzg(__mask_to_int(__builtin_convertvector(__mask, _BoolVec))); | ||
} | ||
# else | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs to be implemented on GCC before the patch can land. |
||
_LIBCPP_ALWAYS_INLINE constexpr bool __any_of(_MaskT __mask) noexcept { | ||
for (size_t __i = 0; __i != _Np; ++__i) { | ||
if (__mask[__i]) | ||
return true; | ||
} | ||
return false; | ||
} | ||
# endif | ||
}; | ||
|
||
template <class _Tp> | ||
requires __is_vectorizable_type_v<_Tp> | ||
struct __deduce_abi<_Tp> { | ||
template <__simd_size_type _Np> | ||
using __apply = __vector_size_abi<_Tp, _Np>; | ||
}; | ||
Comment on lines
+150
to
+155
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might be missing another specialization, but this will lead to ABI troubles. Consider There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, thanks for the heads up -- this is an interesting comment and something we missed. Nikolas was handling some of these calling convention differences via I might suggest something like this: enum class _SimdAbiAffectingFlags {
_None = 1,
_AVX2 = 1 << 2,
_Whatever = 1 << 3,
// etc...
};
consteval _SimdAbiAffectingFlags _CurrentFlags = _SimdAbiAffectingFlags::_None
#ifdef __AVX2__
| _SimdAbiAffectingFlags::_AVX2
#endif
#ifdef __WHATEVER__
| _SimdAbiAffectingFlags::_Whatever
#endif
; and then we pass There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The representation of masks affects ABI (vector masks vs. bit masks). Also consider AVX w/o AVX2: you might want to change the ABI of fp simd but not for integral simd. I also chose the approach that at some point my ABI tags become public API. So even with With Clang you have the simplification that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, and the ODR issues are more fine-grained than the ABI issues. E.g. SSE2, SSE3, SSSE3, SSE4a, XOP, SSE4.1, SSE4.2 all use the same ABI, but code-gen for inline functions can be different. Most importantly running SSE4 code on an SSE3 systems is a SIGILL. Expect users to compile their TUs multiple times with different compiler flags, link them into one binary and then dispatch according to CPUID. 💣 |
||
|
||
template <class _Tp, __simd_size_type _Np> | ||
inline constexpr __simd_size_type __simd_size_v<_Tp, __vector_size_abi<_Tp, _Np>> = _Np; | ||
|
||
} // namespace datapar | ||
_LIBCPP_END_NAMESPACE_STD | ||
|
||
#endif // _LIBCPP_STD_VER >= 26 | ||
|
||
#endif // _LIBCPP___SIMD_ABI_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this at least requires:
Or something similar, maybe
__is_vectorizable_type
instead. But either way this needs a requirement to make it clear that this isn't a very general utility despite its name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct.
int32_t -> float64_t
is value-preserving, as isint16_t -> float32_t
. However e.g.int32_t -> uint32_t
is not value-preserving.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you folks have an implementation in mind for the notion of being value-preserving? From https://eel.is/c++draft/simd#general-8:
Is there a clever implementation trick to do this, or do we need to list them manually, or is there something logical that we can do and we're not seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEEE floating point numbers of type
T~ can exactly represent integers up to
+/-(1 << std::numeric_limits::digits()`).So something like
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just
numeric_limits<To>::digits >= numeric_limits<From>::digits
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a much better way. Just add an extra check to make sure that
is_signed_v<To> || !is_signed_v<From>
.Are these supposed to work with
<stdfloat>
? If so, we also wantTo
's exponent to be at least as large. In particular,float16
has more digits thanbfloat16
, but a smaller exponent, so neither can represent all values from the other.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know of any trick. This is what I implemented:
I think
bfloat16_t
andfloat16_t
need another check besides only fordigits
.bfloat16_t
is not vectorizable yet, but it's arithmetic at least. Themax
andlowest
compares are mixed-type comparisons but I so far I have convinced myself that even with implicit conversions this concept always gives the right answer.