Skip to content

[SYCL] Add support for scalar logical operators with group algorithms #9298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 2, 2023
5 changes: 5 additions & 0 deletions clang/lib/Sema/SPIRVBuiltins.td
Original file line number Diff line number Diff line change
Expand Up @@ -952,3 +952,8 @@ foreach name = ["GroupUMin", "GroupUMax"] in {
foreach name = ["GroupSMin", "GroupSMax"] in {
def : SPVBuiltin<name, [ASIGenTypeN, UInt, UInt, ASIGenTypeN], Attr.Convergent>;
}

// TODO: These builtins need to support vectors of bool.
foreach name = ["GroupLogicalAndKHR", "GroupLogicalOrKHR"] in {
def : SPVBuiltin<name, [Bool, UInt, UInt, Bool], Attr.Convergent>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to SPV_KHR_uniform_group_instructions:

'Result Type' must be a scalar or vector of Boolean type.
The type of 'X' must be the same as 'Result Type'.

I guess this change misses boolean vectors support. Note: I'm fine if that is outlined into a separate PR, but then we probably need to update the title to explicitly mention that only scalars are supported for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, on the SYCL side, I think vector types should still work with the logical operators group algorithms, as the the current implementation of the group algorithms implement the vector group algorithms using the scalar group algorithms. But the SPIRV builtin will only be able to support a scalar value, so I can add a TODO comment mentioning that. I'll also add a test that uses vector types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple issues with the current implementation for group algorithms for vector types and the return type of operator&& for vec<bool, N>, so I will address those issues in another PR. I've renamed the PR to only mention support for scalar operations now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple issues with the current implementation for group algorithms for vector types and the return type of operator&& for vec<bool, N>, so I will address those issues in another PR. I've renamed the PR to only mention support for scalar operations now.

Works for me, thanks!

}
8 changes: 8 additions & 0 deletions libclc/ptx-nvidiacl/libspirv/group/collectives.cl
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ __clc__SubgroupBitwiseAny(uint op, bool predicate, bool *carry) {
#define __CLC_XOR(x, y) (x ^ y)
#define __CLC_AND(x, y) (x & y)
#define __CLC_MUL(x, y) (x * y)
#define __CLC_LOGICAL_OR(x, y) (x || y)
#define __CLC_LOGICAL_AND(x, y) (x && y)

#define __DEFINE_CLC_COMPLEX_MUL(TYPE) \
_CLC_DEF _CLC_OVERLOAD _CLC_CONVERGENT complex_##TYPE __clc_complex_mul( \
Expand Down Expand Up @@ -424,6 +426,9 @@ __CLC_SUBGROUP_COLLECTIVE(BitwiseAndKHR, __CLC_AND, long, ~0l)
__CLC_SUBGROUP_COLLECTIVE(BitwiseOrKHR, __CLC_OR, long, 0l)
__CLC_SUBGROUP_COLLECTIVE(BitwiseXorKHR, __CLC_XOR, long, 0l)

__CLC_SUBGROUP_COLLECTIVE(LogicalOrKHR, __CLC_LOGICAL_OR, bool, false)
__CLC_SUBGROUP_COLLECTIVE(LogicalAndKHR, __CLC_LOGICAL_AND, bool, true)

#undef __CLC_SUBGROUP_COLLECTIVE_BODY
#undef __CLC_SUBGROUP_COLLECTIVE
#undef __CLC_SUBGROUP_COLLECTIVE_REDUX
Expand Down Expand Up @@ -592,6 +597,9 @@ __CLC_GROUP_COLLECTIVE(BitwiseAndKHR, __CLC_AND, long, ~0l)
__CLC_GROUP_COLLECTIVE(BitwiseOrKHR, __CLC_OR, long, 0l)
__CLC_GROUP_COLLECTIVE(BitwiseXorKHR, __CLC_XOR, long, 0l)

__CLC_GROUP_COLLECTIVE(LogicalOrKHR, __CLC_LOGICAL_OR, bool, false)
__CLC_GROUP_COLLECTIVE(LogicalAndKHR, __CLC_LOGICAL_AND, bool, true)

// half requires additional mangled entry points
#define __CLC_GROUP_COLLECTIVE__DF16(MANGLED_NAME, SPIRV_DISPATCH) \
_CLC_DEF _CLC_CONVERGENT half MANGLED_NAME(uint scope, uint op, half x) { \
Expand Down
3 changes: 3 additions & 0 deletions sycl/include/sycl/detail/spirv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1170,6 +1170,9 @@ __SYCL_GROUP_COLLECTIVE_OVERLOAD(BitwiseOrKHR)
__SYCL_GROUP_COLLECTIVE_OVERLOAD(BitwiseXorKHR)
__SYCL_GROUP_COLLECTIVE_OVERLOAD(BitwiseAndKHR)

__SYCL_GROUP_COLLECTIVE_OVERLOAD(LogicalAndKHR)
__SYCL_GROUP_COLLECTIVE_OVERLOAD(LogicalOrKHR)

} // namespace spirv
} // namespace detail
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
Expand Down
16 changes: 16 additions & 0 deletions sycl/include/sycl/ext/oneapi/functional.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ struct GroupOpISigned {};
struct GroupOpIUnsigned {};
struct GroupOpFP {};
struct GroupOpC {};
struct GroupOpBool {};

template <typename T, typename = void> struct GroupOpTag;

Expand All @@ -60,6 +61,11 @@ struct GroupOpTag<
using type = GroupOpC;
};

template <typename T>
struct GroupOpTag<T, std::enable_if_t<detail::is_genbool<T>::value>> {
using type = GroupOpBool;
};

#define __SYCL_CALC_OVERLOAD(GroupTag, SPIRVOperation, BinaryOperation) \
template <__spv::GroupOperation O, typename Group, typename T> \
static T calc(Group g, GroupTag, T x, BinaryOperation) { \
Expand Down Expand Up @@ -91,6 +97,16 @@ __SYCL_CALC_OVERLOAD(GroupOpIUnsigned, BitwiseXorKHR, sycl::bit_xor<T>)
__SYCL_CALC_OVERLOAD(GroupOpISigned, BitwiseAndKHR, sycl::bit_and<T>)
__SYCL_CALC_OVERLOAD(GroupOpIUnsigned, BitwiseAndKHR, sycl::bit_and<T>)

__SYCL_CALC_OVERLOAD(GroupOpBool, LogicalAndKHR, sycl::logical_and<T>)
__SYCL_CALC_OVERLOAD(GroupOpISigned, LogicalAndKHR, sycl::logical_and<T>)
__SYCL_CALC_OVERLOAD(GroupOpIUnsigned, LogicalAndKHR, sycl::logical_and<T>)
__SYCL_CALC_OVERLOAD(GroupOpFP, LogicalAndKHR, sycl::logical_and<T>)

__SYCL_CALC_OVERLOAD(GroupOpBool, LogicalOrKHR, sycl::logical_or<T>)
__SYCL_CALC_OVERLOAD(GroupOpISigned, LogicalOrKHR, sycl::logical_or<T>)
__SYCL_CALC_OVERLOAD(GroupOpIUnsigned, LogicalOrKHR, sycl::logical_or<T>)
__SYCL_CALC_OVERLOAD(GroupOpFP, LogicalOrKHR, sycl::logical_or<T>)

#undef __SYCL_CALC_OVERLOAD

template <__spv::GroupOperation O, typename Group, typename T,
Expand Down
16 changes: 14 additions & 2 deletions sycl/include/sycl/functional.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,20 @@ template <typename T = void> using multiplies = std::multiplies<T>;
template <typename T = void> using bit_and = std::bit_and<T>;
template <typename T = void> using bit_or = std::bit_or<T>;
template <typename T = void> using bit_xor = std::bit_xor<T>;
template <typename T = void> using logical_and = std::logical_and<T>;
template <typename T = void> using logical_or = std::logical_or<T>;

// std:logical_and/std::logical_or with a non-void type returns bool,
// sycl requires returning T.
template <typename T = void> struct logical_and {
T operator()(const T &lhs, const T &rhs) { return lhs && rhs; }
};

template <> struct logical_and<void> : std::logical_and<void> {};

template <typename T = void> struct logical_or {
T operator()(const T &lhs, const T &rhs) { return lhs || rhs; }
};

template <> struct logical_or<void> : std::logical_or<void> {};

template <typename T = void> struct minimum {
T operator()(const T &lhs, const T &rhs) const {
Expand Down
2 changes: 1 addition & 1 deletion sycl/include/sycl/group_algorithm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ template <typename T>
using native_op_list =
type_list<sycl::plus<T>, sycl::bit_or<T>, sycl::bit_xor<T>,
sycl::bit_and<T>, sycl::maximum<T>, sycl::minimum<T>,
sycl::multiplies<T>>;
sycl::multiplies<T>, sycl::logical_or<T>, sycl::logical_and<T>>;

template <typename T, typename BinaryOperation> struct is_native_op {
static constexpr bool value =
Expand Down
6 changes: 5 additions & 1 deletion sycl/include/sycl/known_identity.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,16 @@ using IsBitXOR =

template <typename T, class BinaryOperation>
using IsLogicalAND = std::bool_constant<
std::is_same_v<BinaryOperation, std::logical_and<T>> ||
std::is_same_v<BinaryOperation, std::logical_and<void>> ||
std::is_same_v<BinaryOperation, sycl::logical_and<T>> ||
std::is_same_v<BinaryOperation, sycl::logical_and<void>>>;

template <typename T, class BinaryOperation>
using IsLogicalOR =
std::bool_constant<std::is_same_v<BinaryOperation, sycl::logical_or<T>> ||
std::bool_constant<std::is_same_v<BinaryOperation, std::logical_or<T>> ||
std::is_same_v<BinaryOperation, std::logical_or<void>> ||
std::is_same_v<BinaryOperation, sycl::logical_or<T>> ||
std::is_same_v<BinaryOperation, sycl::logical_or<void>>>;

template <typename T>
Expand Down
14 changes: 14 additions & 0 deletions sycl/test-e2e/GroupAlgorithm/exclusive_scan_sycl2020.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,20 @@ int main() {
test<class KernelNameBitAndI>(q, input_small, output_small,
sycl::bit_and<int>(), ~0);

test<class LogicalOrInt>(q, input, output, sycl::logical_or<int>(), 0);
test<class LogicalAndInt>(q, input, output, sycl::logical_and<int>(), 1);

std::array<bool, N> bool_input = {};
std::array<bool, N> bool_output = {};
test<class LogicalOrBool>(q, bool_input, bool_output,
sycl::logical_or<bool>(), false);
test<class LogicalOrVoid>(q, bool_input, bool_output, sycl::logical_or<>(),
false);
test<class LogicalAndBool>(q, bool_input, bool_output,
sycl::logical_and<bool>(), true);
test<class LogicalAndVoid>(q, bool_input, bool_output, sycl::logical_and<>(),
true);

// as part of SYCL_EXT_ONEAPI_COMPLEX_ALGORITHMS (
// https://github.com/intel/llvm/pull/5108/ ) joint_exclusive_scan and
// exclusive_scan_over_group now operate on std::complex but limited to the
Expand Down
14 changes: 14 additions & 0 deletions sycl/test-e2e/GroupAlgorithm/inclusive_scan_sycl2020.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,20 @@ int main() {
test<class KernelNameBitAndI>(q, input_small, output_small,
sycl::bit_and<int>(), ~0);

test<class LogicalOrInt>(q, input, output, sycl::logical_or<int>(), 0);
test<class LogicalAndInt>(q, input, output, sycl::logical_and<int>(), 1);

std::array<bool, N> bool_input = {};
std::array<bool, N> bool_output = {};
test<class LogicalOrBool>(q, bool_input, bool_output,
sycl::logical_or<bool>(), false);
test<class LogicalOrVoid>(q, bool_input, bool_output, sycl::logical_or<>(),
false);
test<class LogicalAndBool>(q, bool_input, bool_output,
sycl::logical_and<bool>(), true);
test<class LogicalAndVoid>(q, bool_input, bool_output, sycl::logical_and<>(),
true);

// as part of SYCL_EXT_ONEAPI_COMPLEX_ALGORITHMS (
// https://github.com/intel/llvm/pull/5108/ ) joint_inclusive_scan and
// inclusive_scan_over_group now operate on std::complex limited to using the
Expand Down
14 changes: 14 additions & 0 deletions sycl/test-e2e/GroupAlgorithm/reduce_sycl2020.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,20 @@ int main() {
test<class KernelNameBitXorI>(q, input, output, sycl::bit_xor<int>(), 0);
test<class KernelNameBitAndI>(q, input, output, sycl::bit_and<int>(), ~0);

test<class LogicalOrInt>(q, input, output, sycl::logical_or<int>(), 0);
test<class LogicalAndInt>(q, input, output, sycl::logical_and<int>(), 1);

std::array<bool, N> bool_input = {};
std::array<bool, 6> bool_output = {};
test<class LogicalOrBool>(q, bool_input, bool_output,
sycl::logical_or<bool>(), false);
test<class LogicalOrVoid>(q, bool_input, bool_output, sycl::logical_or<>(),
false);
test<class LogicalAndBool>(q, bool_input, bool_output,
sycl::logical_and<bool>(), true);
test<class LogicalAndVoid>(q, bool_input, bool_output, sycl::logical_and<>(),
true);

// as part of SYCL_EXT_ONEAPI_COMPLEX_ALGORITHMS (
// https://github.com/intel/llvm/pull/5108/ ) joint_reduce and
// reduce_over_group now operate on std::complex limited to using the
Expand Down