-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add reduction overloads accepting span #6019
Commits on Apr 18, 2022
-
[SYCL] Add reduction overloads accepting span
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for c0de8c6 - Browse repository at this point
Copy the full SHA c0de8c6View commit details -
[SYCL] Add specialization of reducer for spans
Splits the functionality of a regular reducer across two classes: 1) reducer contains the work-item's private data and exposes the subscript operator ([]) to access an individual element. 2) reducer_element contains a pointer to one element from the reducer, and exposes the regular combine interface. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 1642d2f - Browse repository at this point
Copy the full SHA 1642d2fView commit details -
[SYCL] Add specialization of reduction_impl
The specialization allows the type of the reduction (span<T, Extent>) to be separate from the type of the span (T). T is used to determine the reduction algorithm that should be used, and to create temporary storage. A new static member function "num_elements" is added to all reduction_impl specializations to distinguish between scalar reductions and array reductions. A scalar reduction (all existing implementations of reduction_impl) always has (num_elements == 1); an array reduction (this new specialization) has (num_elements == Extent), where Extent is the extent of the span. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 86e9ef5 - Browse repository at this point
Copy the full SHA 86e9ef5View commit details -
Each of the existing reduction implementations (for a single reduction object) can be extended to support spans by looping over the number of elements in the reduction. If (num_elements == 1), the loop has a single iteration and degenerates to the behavior of the reduction implementation prior to this commit. If (num_elements > 1), the loop iterates over each reduction element in turn. Note that the getElement() function allows the scalar and array reduction implementations to be the same without specializing for either case, and allowing difference in storage (a single T vs an array of Ts). This is especially convenient because a scalar reduction is equivalent to an array reduction with a single element. If (num_elements > 1), the implementation currently reduces each element separately. This allows array reductions to use the same amount of work-group local memory as a scalar reduction using the same T, but at the expense of additional synchronization calls. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for c0e0cec - Browse repository at this point
Copy the full SHA c0e0cecView commit details -
[SYCL] Disable span reductions before C++17
Necessary because span.hpp is only available >= C++17. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 1c97a16 - Browse repository at this point
Copy the full SHA 1c97a16View commit details
Commits on Apr 21, 2022
-
Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 807440c - Browse repository at this point
Copy the full SHA 807440cView commit details -
[SYCL][NFC] Refactor to avoid code duplication
This commit refactors the reduction implementation to avoid code duplication, and additionally makes a number of changes to the class hierarchy for future extensibility. All functionality and member variables expected to be common to all reductions is moved to a common base class (reduction_impl_common). The existing reduction_impl_base is unsuitable for this purpose because it was deliberately designed not to be a template class. The reduction_impl is now templated on a reduction algorithm, with any functionality related to the current (default) algorithm encapsulated in the default_reduction_algorithm class. This template is carried from the reduction to any reducers it creates, enabling future specialization of both reduction and reducer for interesting combinations of type, extent and properties. The reducer class is simplified using CRTP to avoid duplicate definitions of combine() and atomic_combine() for scalar and array reductions. The notion of a reducer's "dimensionality" is now tied to the dimensionality of the reduction being performed (i.e. 0 for scalars, 1 for spans) and not to the dimensionality of the input accessor/buffer. This will simplify extending reductions to true multi-dimensional array reductions (i.e. with md_span). Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 111ea2f - Browse repository at this point
Copy the full SHA 111ea2fView commit details
Commits on Apr 26, 2022
-
[SYCL] Fix bug introduced during refactor
The definition of atomic_combine() did not correctly compute the offsets for reductions with Extent > 1. This was hidden by a bug in the associated tests. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for f672248 - Browse repository at this point
Copy the full SHA f672248View commit details -
[SYCL] Add span support to packs of reductions
Scalar and array (span) reductions require different algorithms, preventing the original scalar reduction code from being generalized to cover spans. Specifically, the original scalar reduction code assumes that each stage of a reduction (initializing reducers, combining reducers in local memory, writing reducers back to global memory) can be applied to all reductions in a pack at the same time. This is incompatible with the current implementation of array reductions, which iterates over all elements of a span in turn at each stage. The implementation introduced in this commit filters the scalar and array reductions from the reduction pack and handles them separately. All scalar reductions are handled in parallel (as before), while each array reduction is handled separately. In future, the code handling the reduction packs could be generalized to handle more reduction/reducer types while making less assumptions about the reduction algorithm. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 5207769 - Browse repository at this point
Copy the full SHA 5207769View commit details
Commits on Apr 27, 2022
-
[SYCL] Generalize esimd_verify test for reducers
Previously assumed specific reducer implementation and template arguments. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for cab1b2f - Browse repository at this point
Copy the full SHA cab1b2fView commit details
Commits on Apr 29, 2022
-
Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for a9d18cc - Browse repository at this point
Copy the full SHA a9d18ccView commit details -
Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 0236111 - Browse repository at this point
Copy the full SHA 0236111View commit details -
Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 7608d97 - Browse repository at this point
Copy the full SHA 7608d97View commit details -
Clarify comment describing View template parameter
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for ca998d7 - Browse repository at this point
Copy the full SHA ca998d7View commit details -
Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for af22456 - Browse repository at this point
Copy the full SHA af22456View commit details -
Add comment to first reducer specialization
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 657ff53 - Browse repository at this point
Copy the full SHA 657ff53View commit details -
Remove redundant SFINAE from getIdentity()
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 9553b7f - Browse repository at this point
Copy the full SHA 9553b7fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 12056fc - Browse repository at this point
Copy the full SHA 12056fcView commit details -
[SYCL] Revert to sycl::detail::tuple
std::tuple is still used for reducers (as in the original code). Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for cb030f2 - Browse repository at this point
Copy the full SHA cb030f2View commit details -
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for c2663db - Browse repository at this point
Copy the full SHA c2663dbView commit details -
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 90deb60 - Browse repository at this point
Copy the full SHA 90deb60View commit details -
span<T, 1> is still an array reduction. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 6843a67 - Browse repository at this point
Copy the full SHA 6843a67View commit details