[SYCL] Add reduction overloads accepting span #6019

Pennycook · 2022-04-18T18:53:40Z

A new static member function "num_elements" is added to all reduction
specializations to distinguish between scalar reductions and array reductions.
A scalar reduction (all existing implementations of reduction) always
has (num_elements == 1); an array reduction (this new specialization) has
(num_elements == Extent), where Extent is the extent of the span.

If (num_elements > 1), the implementation currently reduces each element
separately. This allows array reductions to use the same amount of work-group
local memory as a scalar reduction using the same T, but at the expense of
additional synchronization calls.

The notion of a reducer's "dimensionality" is now tied to the dimensionality of
the reduction being performed (i.e. 0 for scalars, 1 for spans) and not to the
dimensionality of the input accessor/buffer. This will simplify extending
reductions to true multi-dimensional array reductions (i.e. with md_span).

In future, the code handling the reduction packs could be generalized to
handle more reduction/reducer types while making fewer assumptions about
the reduction algorithm.

Signed-off-by: John Pennycook john.pennycook@intel.com

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Splits the functionality of a regular reducer across two classes: 1) reducer contains the work-item's private data and exposes the subscript operator ([]) to access an individual element. 2) reducer_element contains a pointer to one element from the reducer, and exposes the regular combine interface. Signed-off-by: John Pennycook <john.pennycook@intel.com>

The specialization allows the type of the reduction (span<T, Extent>) to be separate from the type of the span (T). T is used to determine the reduction algorithm that should be used, and to create temporary storage. A new static member function "num_elements" is added to all reduction_impl specializations to distinguish between scalar reductions and array reductions. A scalar reduction (all existing implementations of reduction_impl) always has (num_elements == 1); an array reduction (this new specialization) has (num_elements == Extent), where Extent is the extent of the span. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Each of the existing reduction implementations (for a single reduction object) can be extended to support spans by looping over the number of elements in the reduction. If (num_elements == 1), the loop has a single iteration and degenerates to the behavior of the reduction implementation prior to this commit. If (num_elements > 1), the loop iterates over each reduction element in turn. Note that the getElement() function allows the scalar and array reduction implementations to be the same without specializing for either case, and allowing difference in storage (a single T vs an array of Ts). This is especially convenient because a scalar reduction is equivalent to an array reduction with a single element. If (num_elements > 1), the implementation currently reduces each element separately. This allows array reductions to use the same amount of work-group local memory as a scalar reduction using the same T, but at the expense of additional synchronization calls. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook · 2022-04-18T20:01:53Z

@pvchupin: I didn't realize that the entire sycl::span header is only available for C++17 onwards. Is the expected fix here to hide any sycl::reduction interfaces that use a sycl::span?

pvchupin · 2022-04-18T20:16:44Z

@pvchupin: I didn't realize that the entire sycl::span header is only available for C++17 onwards. Is the expected fix here to hide any sycl::reduction interfaces that use a sycl::span?

Yes, that would be good, unless it's hard to guard properly. We did something similar in changes like #4303

I should let @v-klochkov and @steffenlarsen to review in more details.

Necessary because span.hpp is only available >= C++17. Signed-off-by: John Pennycook <john.pennycook@intel.com>

cperkinsintel · 2022-04-18T21:45:09Z

@Pennycook - I didn't realize that the sycl span stuff had been put behind a C++ 17 guard. If @romanovvlad did it he likely had a reason, but I recall when working on sycl span that the intent was explicitly to bring that C++ 20 feature into SYCL. But maybe that was back before we were worrying about C++14 vs C++17.

I'd have to go through that header, which was mostly adapted from libcxx, but I think it should mostly be compatible with C++ 14, excepting the deduction guides, of course. Though, IIRC, those guides don't provide nice-to-have functionality, I think they may be central.

pvchupin · 2022-04-18T22:03:31Z

We have some number of folks who asked to have these guards to be set properly so that sycl headers could compile even if host compiler doesn't fully support C++17, by effectively disabling corresponding APIs which do require C++17 features. If it becomes problematic we can reconsider.

steffenlarsen

Great work, @Pennycook ! I will do a more thorough read-through of it once it is out of draft, but in general I think it looks good. My primary concern is the amount of duplicated code with the specialization of reduction_impl. Would it make sense to move more of the common code into a common superclass? Hopefully it would make maintenance a little easier.

I have opened some threads on the PR for discussing the open questions, I hope that is okay. I feel it's a little easier to keep track of the different discussions that way. 😄

steffenlarsen · 2022-04-20T15:08:40Z

sycl/include/CL/sycl/reduction.hpp

@@ -100,5 +100,57 @@ reduction(T *Var, const T &Identity, BinaryOperation Combiner,
  return {Var, Identity, Combiner, InitializeToIdentity};
 }

+#if __cplusplus >= 201703L


Is there a better way to define the specialization of reducer to avoid potential ABI breakage in the future?
This implementation works by privatizing the entire span on each work-item, which will fail for sufficiently large spans. Eventually we'll need to choose a different reducer implementation based on the Extent parameter. Should reducer and reducer_element be SFINAE'd away for certain Extent values, even in this first implementation?

Do you have a specific scenario in mind where this would break ABI? Seems to me like this is all in headers and since the extent is a template parameter I am not sure if user-code incompatibility should be much of a concern.
That said, if we know what Extent we need to stop at, I don't mind an enable_if masking the definition out or alternatively a static_assert telling the user that the given Extent isn't yet supported. Is the Extent limit resource-constrained? If so, I think we may as well let the user do as wild reductions as their hardware allows until we have a good alternative path to take based on some heuristic.

Do you have a specific scenario in mind where this would break ABI? Seems to me like this is all in headers and since the extent is a template parameter I am not sure if user-code incompatibility should be much of a concern.

No specific scenario in mind -- I'm just wary of committing a reducer implementation and somehow being stuck with it.

Is the Extent limit resource-constrained? If so, I think we may as well let the user do as wild reductions as their hardware allows until we have a good alternative path to take based on some heuristic.

The current implementation assumes that there are sufficient resources to give each work-item its own copy of the range being reduced (in private memory). So I think you're right that the limit will be different on different hardware.

I can foresee a few different implementations of the reducer class:

The current implementation (work-item privatization)

An implementation that uses atomics to a partially privatized copy (e.g. work-group privatization)

An implementation that uses atomics directly to the reduction variable (no privatization)

I was thinking of doing something like creating an enum for different reduction implementations, and then basing the SFINAE on that. So the current implementation would only be available if reduction_algorithm == work_item_privatization or something like that, and every reduction we create today would have that trait set. If we change the reduction() function later to create reductions with different algorithms for different extents, any reducer already compiled to use work_item_privatization would still work.

Do you think something like that would make sense?

It makes sense to me. Quite frankly I expected we would eventually reach a point where we would need to expose different reduction strategies to the user and what you suggest is pretty much how I expected we would do it.

How exactly we would want to approach that though is a good question. It sounds like this choice would be user-facing and as such we would probably need an extension. That said, it seems like it would be a great excuse to bring compile-time properties to reduction.

I've restructured things to avoid code duplication in 111ea2f. While I was doing it I've added a few new template arguments that we can use for specialization.

I'm using an algorithm class to store all the implementation-specific stuff because it gives us a bit more flexibility than an enum, but I think once we've got a few more of these we should definitely think about exposing a few compile-time properties. We've already had a request for a few different properties (see #1621). A good test of this new class structure would be whether we can move the existing SYCL_REDUCTION_DETERMINISTIC into something like a deterministic_reduction_algorithm class and just plug it in.

I'm hoping that the combination of Dims, Extent and Algorithm will be enough to do the filtering for the parameter pack. 🤞

steffenlarsen · 2022-04-20T15:09:27Z

sycl/include/CL/sycl/reduction.hpp

@@ -100,5 +100,57 @@ reduction(T *Var, const T &Identity, BinaryOperation Combiner,
  return {Var, Identity, Combiner, InitializeToIdentity};
 }

+#if __cplusplus >= 201703L


How should this be extended to the reduction form accepting a parameter pack?
The current parameter pack implementation performs each step of a reduction algorithm for all reductions (i.e. initializes all temporary storage, computes all partial sums, updates all global storage) which prevents looping over the elements. I can't decide whether it's better to try and squeeze the new reducer functionality into the existing framework (i.e. by using more memory than required) or if we need to revisit the design of the parameter pack solution knowing that SYCL 2020 has added different reducer types that may require different algorithms. (I'm leaning towards the second option).

I am happy to have it as part of a separate PR, but I think in the end it would definitely make sense to have the variadic reductions do with span as well. If you think retrofitting the current implementation to the extent-based iteration strategy is more hassle than it is worth, I don't think there's a reason not to consider a refactor of the implementation. I suspect @v-klochkov has a better mental model of the magical variadic version, so maybe he has some ideas of feasible ways to make it work.

Ok, great! I do think it would be a lot of work to retrofit things -- the resulting code wouldn't be very efficient for spans, and I suspect we'd just be putting off refactoring work that we'll need to do later.

If there was some way to filter out different kinds of reductions from the pack, I think we could implement something quite clean... Scalar reductions could use the existing code, the current span reductions could be forwarded somewhere else, and if we introduce a new kind of reduction/algorithm later we could handle them with a similar filter. I have no idea how to go about implementing such a thing, though -- @v-klochkov, does that sound feasible?

If the question is to add sycl::span to existing variadic-pack implementation, then it should be doable.
If we also want to use different algorithm per each of reduction in the pack depending on the type and binary-operation, then it is more complex work. I thought it would be variadic pack filters, similar to those (or may be exactly those):

/// For each index 'I' from the given indices pack 'Is' this function returns /// an index sequence consisting of only those 'I's for which the 'FunctorT' /// applied to 'T[I]' returns true. template <typename... T, typename FunctorT, size_t... Is> constexpr auto filterSequence(FunctorT F, std::index_sequence<Is...> Indices) { return filterSequenceHelper<T...>(F, Indices); }

That work also requires some caution about the device code-quality and checking how those tricks with sequences are translated to the final code. On the C++ higher level I hoped FE can do/eliminate most of work/code because all values would be compile time known, and generate something simple at the end, but that is not for sure...

Pennycook · 2022-04-20T16:09:12Z

My primary concern is the amount of duplicated code with the specialization of reduction_impl. Would it make sense to move more of the common code into a common superclass? Hopefully it would make maintenance a little easier.

The duplication in reduction_impl is another example of me not understanding exactly what our ABI limitations are, I think... It's not something that I've had to pay a lot of attention to before (most code I write is hacky proof-of-concepts) so I've perhaps been a little too cautious. If we moved the member variables and functions from reduction_impl to reduction_impl_base, would that impact ABI?

I have opened some threads on the PR for discussing the open questions, I hope that is okay. I feel it's a little easier to keep track of the different discussions that way. 😄

Of course! Thank you for doing it. 👍

steffenlarsen · 2022-04-20T16:39:46Z

The duplication in reduction_impl is another example of me not understanding exactly what our ABI limitations are, I think... It's not something that I've had to pay a lot of attention to before (most code I write is hacky proof-of-concepts) so I've perhaps been a little too cautious. If we moved the member variables and functions from reduction_impl to reduction_impl_base, would that impact ABI?

Always wise to fear the ABI. In general we primarily need to worry about classes that cross the library boundary. Of course, that is not always easy to track but most often templated classes do not and so are safe to alter. There are some exceptions to this, such as accessors, but since reduction variables and the associated parallel_for etc. are simply header-defined wrappers around reduction operations, it should - to my knowledge - be safe to change as you like.

v-klochkov

Hi John,
I started reviewing the code. It will take more time for me as I almost forgot how reductions were implemented.

sycl/include/sycl/ext/oneapi/reduction.hpp

Pennycook · 2022-04-21T14:26:02Z

Always wise to fear the ABI. In general we primarily need to worry about classes that cross the library boundary. Of course, that is not always easy to track but most often templated classes do not and so are safe to alter. There are some exceptions to this, such as accessors, but since reduction variables and the associated parallel_for etc. are simply header-defined wrappers around reduction operations, it should - to my knowledge - be safe to change as you like.

I see! Thanks for the explanation. I think that should make things easier to tidy up -- if I don't have to leave the original reduction_impl class exactly the way I found it, I've got more options for code re-use. I'll start working on the refactor today.

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

v-klochkov · 2022-04-21T19:34:10Z

sycl/include/sycl/ext/oneapi/reduction.hpp

+               : memory_scope::device;
+  }
+
+  // TODO: Skip atomic operations with the identity


What does this comment mean? Don't define them for known_identity=true case, or that it is possible to detect when the operand is identity, or something else?

I'm thinking about cases where the operand is the identity. This is really just a note to myself at this point -- while working on the span side of things, I realized that it's possible to define a reduction over a large span but only update a portion of it. Skipping over those elements that the work-item didn't touch might improve performance by reducing the number of atomics.

This is probably he sort of thing that is best left to a compile-time property mechanism (as @steffenlarsen mentioned), though. The user (hopefully) has a good idea of whether they're going to write to every element of the span on every work-item, and we could use that information to tune the implementation.

This commit refactors the reduction implementation to avoid code duplication, and additionally makes a number of changes to the class hierarchy for future extensibility. All functionality and member variables expected to be common to all reductions is moved to a common base class (reduction_impl_common). The existing reduction_impl_base is unsuitable for this purpose because it was deliberately designed not to be a template class. The reduction_impl is now templated on a reduction algorithm, with any functionality related to the current (default) algorithm encapsulated in the default_reduction_algorithm class. This template is carried from the reduction to any reducers it creates, enabling future specialization of both reduction and reducer for interesting combinations of type, extent and properties. The reducer class is simplified using CRTP to avoid duplicate definitions of combine() and atomic_combine() for scalar and array reductions. The notion of a reducer's "dimensionality" is now tied to the dimensionality of the reduction being performed (i.e. 0 for scalars, 1 for spans) and not to the dimensionality of the input accessor/buffer. This will simplify extending reductions to true multi-dimensional array reductions (i.e. with md_span). Signed-off-by: John Pennycook <john.pennycook@intel.com>

The definition of atomic_combine() did not correctly compute the offsets for reductions with Extent > 1. This was hidden by a bug in the associated tests. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Scalar and array (span) reductions require different algorithms, preventing the original scalar reduction code from being generalized to cover spans. Specifically, the original scalar reduction code assumes that each stage of a reduction (initializing reducers, combining reducers in local memory, writing reducers back to global memory) can be applied to all reductions in a pack at the same time. This is incompatible with the current implementation of array reductions, which iterates over all elements of a span in turn at each stage. The implementation introduced in this commit filters the scalar and array reductions from the reduction pack and handles them separately. All scalar reductions are handled in parallel (as before), while each array reduction is handled separately. In future, the code handling the reduction packs could be generalized to handle more reduction/reducer types while making less assumptions about the reduction algorithm. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook · 2022-04-26T20:49:57Z

There's an ESIMD-related failure that I don't understand, but otherwise things seem to be working. I found an issue with reducer while writing the tests (see #6065) but the fix is unrelated to spans so I'd rather address it in a separate PR.

Previously assumed specific reducer implementation and template arguments. Signed-off-by: John Pennycook <john.pennycook@intel.com>

kbobrovs

ESIMD part LGTM

v-klochkov

Hi John,
The patch is great!!!! I like the new ideas, the code refactoring and am enjoying reviewing this PR.
I couldn't start reviewing it earlier and it also takes longer than expected. My apologies.

I am still going through the code. Please see the first portion of my comments.

v-klochkov · 2022-04-29T04:08:28Z

sycl/include/CL/sycl/reduction.hpp

 reduction(T *Var, const T &Identity, BinaryOperation Combiner,
          const property_list &PropList = {}) {
  bool InitializeToIdentity =
      PropList.has_property<property::reduction::initialize_to_identity>();
  return {Var, Identity, Combiner, InitializeToIdentity};
 }

+#if __cplusplus >= 201703L
+/// Constructs a reduction object using the reduction variable referenced by
+/// the given sycl::span \p Span, reduction operation \p Combiner, and


Minor comment: It is not problem of your patch, I see this comment is copy-pasted. I am just adding this comment to here for future reference/reminder.

The reference "\p Combiner" in the comment is invalid because the arguments list does not have it, it has only the type 'BinaryOperation'

sycl/include/CL/sycl/reduction.hpp

sycl/include/sycl/ext/oneapi/reduction.hpp

sycl/include/CL/sycl/reduction.hpp

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

sycl/include/sycl/ext/oneapi/reduction.hpp

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

Signed-off-by: John Pennycook <john.pennycook@intel.com>

std::tuple is still used for reducers (as in the original code). Signed-off-by: John Pennycook <john.pennycook@intel.com>

steffenlarsen

Awesome work! A few - mostly cosmetic - comments but other than that I think @v-klochkov got the details covered.

sycl/include/sycl/ext/oneapi/reduction.hpp

steffenlarsen · 2022-04-29T18:38:18Z

sycl/include/sycl/ext/oneapi/reduction.hpp

+  if (LID == 0) {
+    size_t GrID = NDIt.get_group_linear_id();
+    writeReduSumsToOutAccs<Pow2WG, IsOneWG>(
+        GrID, WGSize, (std::tuple<Reductions...> *)nullptr, OutAccsTuple,


Is the cast required?

I didn't write this code (I just moved it into reduCGFuncImplScalar) but I think the cast is required unless we change the definition of writeReduSumsToOutAccs.

My understanding is that this is basically a dummy argument that's being used to help the compiler deduce the Reductions... template of writeReduSumsToOutAccs. I think it should be possible to remove this pointer argument completely by passing Reductions... as an explicit template argument, though? Something like:

writeReduSumsToOutAccs<Pow2WG, IsOneWG, Reductions...>( GrID, WGSize, OutAccsTuple, LocalAccsTuple, BOPsTuple, IdentitiesTuple, InitToIdentityProps, ReduIndices);

Should I make that change?

That code/pointer was needed to help deducer.
The fix is already huge and extremely complex. Let's keep optimization/improvement of such places for separate PRs.

Signed-off-by: John Pennycook <john.pennycook@intel.com>

span<T, 1> is still an array reduction. Signed-off-by: John Pennycook <john.pennycook@intel.com>

v-klochkov

John, thank you for this epic work!
I don't have any comments at this moment. The change-set looks great!

v-klochkov · 2022-04-29T21:36:51Z

BTW, this reduction implementation is getting more and more classes.
They are very convenient and help to avoid code-duplication,
but the code becomes truly complex and difficult to move further without having some diagram of classes and 1-2 line description explaining which class is needed for what.
It would be great to have such diagram eventually (maybe something similar to this https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/accessor.hpp#L92, but a bit more complex that that).

Pennycook · 2022-04-29T21:46:34Z

BTW, this reduction implementation is getting more and more classes. They are very convenient and help to avoid code-duplication, but the code becomes truly complex and difficult to move further without having some diagram of classes and 1-2 line description explaining which class is needed for what. It would be great to have such diagram eventually (maybe something similar to this https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/accessor.hpp#L92, but a bit more complex that that).

I completely agree.

There are a few other things we could to do simplify the code before we draw those diagrams, which might be worth it... We've already discussed how removing the legacy accessor interface would simplify things; I also saw some comments in the code suggesting that eventually we could remove everything except the parameter pack implementation, which might allow us to remove a few more specializations (or, at least, move them somewhere else).

v-klochkov · 2022-04-29T21:54:57Z

I also saw some comments in the code suggesting that eventually we could remove everything except the parameter pack implementation, which might allow us to remove a few more specializations (or, at least, move them somewhere else).

Yes, that would be the best option - to have only variadic pack implementation. The problem with that is that implementation is more complex and challenges compiler a lot causing all kind of visible and hidden errors in clang and GPU/CPU RT compiler as well. The worst of those are so called "flaky" issues.
For example, this test is still disabled: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/Reduction/reduction_nd_N_vars.cpp#L6

Pennycook · 2022-05-02T15:53:57Z

Yes, that would be the best option - to have only variadic pack implementation. The problem with that is that implementation is more complex and challenges compiler a lot causing all kind of visible and hidden errors in clang and GPU/CPU RT compiler as well. The worst of those are so called "flaky" issues. For example, this test is still disabled: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/Reduction/reduction_nd_N_vars.cpp#L6

Ah, I see. More of a long-term goal, then. Makes sense.

Is there anything else I need to do here before this can be merged, @v-klochkov, @steffenlarsen? I'll update the original PR description to reflect the commit message I think we should use.

v-klochkov · 2022-05-02T20:27:42Z

Yes, that would be the best option - to have only variadic pack implementation. The problem with that is that implementation is more complex and challenges compiler a lot causing all kind of visible and hidden errors in clang and GPU/CPU RT compiler as well. The worst of those are so called "flaky" issues. For example, this test is still disabled: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/Reduction/reduction_nd_N_vars.cpp#L6

Ah, I see. More of a long-term goal, then. Makes sense.

Is there anything else I need to do here before this can be merged, @v-klochkov, @steffenlarsen? I'll update the original PR description to reflect the commit message I think we should use.

John, I can merge it now, the testing passed. If you send me the fixed commit message, I'll substitute the original comment.
Otherwise, if you have permissions, you can just proceed with the commend update and merge.

Pennycook · 2022-05-02T21:10:19Z

John, I can merge it now, the testing passed. If you send me the fixed commit message, I'll substitute the original comment. Otherwise, if you have permissions, you can just proceed with the commend update and merge.

I've changed the original comment, but don't have permissions to merge. Please go ahead!

v-klochkov · 2022-05-03T15:42:09Z

@Pennycook - if you have some tests for reduction+span, would you please upload them to intel/llvm-test-suite and add a link here?

Pennycook · 2022-05-03T15:44:22Z

@Pennycook - if you have some tests for reduction+span, would you please upload them to intel/llvm-test-suite and add a link here?

See intel/llvm-test-suite#1009

* ext::oneapi::reduction removed in #6634 * sycl::item in kernel supported since #7478 * sycl::range + many reductions implemented in #7456 * CPU reduction performance implemented in #6164 * span support implemented in #6019 There might be other things that have been implemented already, but I cannot immediately identify them, if any.

Pennycook added 4 commits April 18, 2022 14:12

[SYCL] Add reduction overloads accepting span

c0de8c6

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook requested review from v-klochkov and steffenlarsen April 18, 2022 18:53

pvchupin requested a review from gmlueck April 18, 2022 20:17

[SYCL] Disable span reductions before C++17

1c97a16

Necessary because span.hpp is only available >= C++17. Signed-off-by: John Pennycook <john.pennycook@intel.com>

steffenlarsen reviewed Apr 20, 2022

View reviewed changes

v-klochkov reviewed Apr 21, 2022

View reviewed changes

sycl/include/sycl/ext/oneapi/reduction.hpp Outdated Show resolved Hide resolved

sycl/include/sycl/ext/oneapi/reduction.hpp Show resolved Hide resolved

Remove unused argument

807440c

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

v-klochkov reviewed Apr 21, 2022

View reviewed changes

Pennycook added 3 commits April 21, 2022 18:09

[SYCL] Fix bug introduced during refactor

f672248

The definition of atomic_combine() did not correctly compute the offsets for reductions with Extent > 1. This was hidden by a bug in the associated tests. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook mentioned this pull request Apr 26, 2022

[SYCL] Add tests for span reductions intel/llvm-test-suite#1009

Merged

Pennycook marked this pull request as ready for review April 26, 2022 20:50

Pennycook requested a review from a team as a code owner April 26, 2022 20:50

[SYCL] Generalize esimd_verify test for reducers

cab1b2f

Previously assumed specific reducer implementation and template arguments. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook requested a review from a team as a code owner April 27, 2022 14:44

kbobrovs previously approved these changes Apr 28, 2022

View reviewed changes

v-klochkov reviewed Apr 29, 2022

View reviewed changes

Remove incorrect comment

a9d18cc

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

Pennycook dismissed kbobrovs’s stale review via a9d18cc April 29, 2022 15:03

Pennycook and others added 2 commits April 29, 2022 08:04

Remove unused argument

0236111

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

Remove unused arguments

7608d97

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

v-klochkov reviewed Apr 29, 2022

View reviewed changes

sycl/include/sycl/ext/oneapi/reduction.hpp Outdated Show resolved Hide resolved

Pennycook and others added 6 commits April 29, 2022 12:47

Clarify comment describing View template parameter

ca998d7

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Fix comment

af22456

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

Add comment to first reducer specialization

657ff53

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Remove redundant SFINAE from getIdentity()

9553b7f

Signed-off-by: John Pennycook <john.pennycook@intel.com>

clang-format

12056fc

[SYCL] Revert to sycl::detail::tuple

cb030f2

std::tuple is still used for reducers (as in the original code). Signed-off-by: John Pennycook <john.pennycook@intel.com>

steffenlarsen reviewed Apr 29, 2022

View reviewed changes

Pennycook added 3 commits April 29, 2022 14:54

Remove old TODO

c2663db

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Remove commented out code

90deb60

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Fix IsArrayReduction

6843a67

span<T, 1> is still an array reduction. Signed-off-by: John Pennycook <john.pennycook@intel.com>

v-klochkov approved these changes Apr 29, 2022

View reviewed changes

v-klochkov merged commit 863383b into intel:sycl May 2, 2022

Pennycook linked an issue May 3, 2022 that may be closed by this pull request

[SYCL] Implement span support for sycl::reduction #5941

Closed

Pennycook mentioned this pull request Dec 12, 2022

[SYCL][Doc] Remove now incorrect info from Reduction_status.md #7751

Merged

[SYCL] Add reduction overloads accepting span #6019

[SYCL] Add reduction overloads accepting span #6019

Uh oh!

Conversation

Pennycook commented Apr 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pennycook commented Apr 18, 2022

Uh oh!

pvchupin commented Apr 18, 2022

Uh oh!

cperkinsintel commented Apr 18, 2022

Uh oh!

pvchupin commented Apr 18, 2022

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steffenlarsen Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pennycook commented Apr 20, 2022

Uh oh!

steffenlarsen commented Apr 20, 2022

Uh oh!

v-klochkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pennycook commented Apr 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pennycook commented Apr 26, 2022

Uh oh!

kbobrovs left a comment

Choose a reason for hiding this comment

Uh oh!

v-klochkov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pennycook commented Apr 18, 2022 •

edited

Loading

steffenlarsen Apr 20, 2022 •

edited

Loading