Skip to content

Commit edaee9b

Browse files
authored
[SYCL][Doc] Add group sorting algorithms extension specification (#3514)
This extension add the following: * `joint_sort`, `sort_over_group` algorithms * `sorter` that is a special type that let SYCL backends choose a method for sorting (e. g. radix sort, ...). * 2 predefined sorters: `default_sorter`, `radix_sorter` * Some issues that are mentioned in the Issue section. For future: * Introduce interfaces for sorting (and for other Group algorithms) basing on static arrays allocated in private memory for better performance. * Introduce interfaces for key-value sorting (see the 2nd issue in the Issue section ) Signed-off-by: Fedorov, Andrey <andrey.fedorov@intel.com>
1 parent 613af3c commit edaee9b

File tree

2 files changed

+325
-0
lines changed

2 files changed

+325
-0
lines changed
Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,324 @@
1+
= SYCL_EXT_ONEAPI_GROUP_SORT
2+
:source-highlighter: coderay
3+
:coderay-linenums-mode: table
4+
5+
// This section needs to be after the document title.
6+
:doctype: book
7+
:toc2:
8+
:toc: left
9+
:encoding: utf-8
10+
:lang: en
11+
12+
:blank: pass:[ +]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
== Notice
20+
21+
Copyright (c) 2021 Intel Corporation. All rights reserved.
22+
23+
IMPORTANT: This specification is a draft.
24+
25+
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
26+
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
27+
used by permission by Khronos.
28+
29+
NOTE: This document is better viewed when rendered as html with asciidoctor.
30+
GitHub does not render image icons.
31+
32+
This extension is written against the SYCL 2020 revision 3 specification. All
33+
references below to the "core SYCL specification" or to section numbers in the
34+
SYCL specification refer to that revision.
35+
36+
== Introduction
37+
38+
This extension introduces sorting functions to the group algorithms library and Sorter objects.
39+
40+
== Feature test macro
41+
42+
This extension provides a feature-test macro as described in the core SYCL
43+
specification section 6.3.3 "Feature test macros". Therefore, an
44+
implementation supporting this extension must predefine the macro
45+
`SYCL_EXT_ONEAPI_GROUP_SORT` to one of the values defined in the table below.
46+
Applications can test for the existence of this macro to determine if the
47+
implementation supports this feature, or applications can test the macro's
48+
value to determine which of the extension's APIs the implementation supports.
49+
50+
[%header,cols="1,5"]
51+
|===
52+
|Value |Description
53+
|1 |Initial extension version. Base features are supported.
54+
|===
55+
56+
==== Sorter
57+
58+
Sorter is a special type that encapsulates a sorting algorithm. Sorter may contain parameters
59+
that help to get better performance. Data for sorting are provided to the `operator()`
60+
that should contain an implementation of a sorting algorithm.
61+
Semantics of `operator()` is following:
62+
63+
[source,c++]
64+
----
65+
template<typename Group, typename Ptr>
66+
void operator()(Group g, Ptr first, Ptr last);
67+
68+
template<typename Group, typename T>
69+
T operator()(Group g, T val);
70+
----
71+
72+
At least one overload for `operator()` is required.
73+
74+
Table. `operator()` for Sorters.
75+
|===
76+
|`operator()`|Description
77+
78+
|`template<typename Group, typename Ptr>
79+
void operator()(Group g, Ptr first, Ptr last);`
80+
|Implements a sorting algorithm that calls by `joint_sort`.
81+
Available only if `sycl::is_group_v<std::decay_t<Group>>` is true.
82+
`first`, `last` must be the same for all work-items in the group.
83+
84+
|`template<typename Group, typename T>
85+
T operator()(Group g, T val);`
86+
|Implements a sorting algorithm that calls by `sort_over_group`.
87+
Available only if `sycl::is_group_v<std::decay_t<Group>>` is true.
88+
|===
89+
90+
Example of custom Sorter:
91+
[source,c++]
92+
----
93+
template<typename Compare>
94+
class bubble_sort{
95+
public:
96+
Compare comp;
97+
98+
template<typename Group, typename Ptr>
99+
void operator()(Group g, Ptr first, Ptr last){
100+
size_t n = last - first;
101+
size_t idx = g.get_local_id().get(0);
102+
if(idx == 0)
103+
for(size_t i = 0; i < n; ++i)
104+
for(size_t j = i + 1; j < n; ++j)
105+
if(comp(first[j], first[i]))
106+
std::swap(first[i], first[j]);
107+
}
108+
};
109+
----
110+
111+
==== Predefined Sorters
112+
113+
`radix_order` is a `enum` that defines the sorting order when `radix_sorter` is used.
114+
Only ascending and descending orders are applicable.
115+
116+
[source,c++]
117+
----
118+
namespace sycl::ext::oneapi {
119+
120+
enum class radix_order {
121+
ascending,
122+
descending
123+
};
124+
125+
}
126+
----
127+
128+
SYCL provides the following predefined classes:
129+
130+
[source,c++]
131+
----
132+
namespace sycl::ext::oneapi {
133+
134+
template<typename Compare = std::less<>>
135+
class default_sorter {
136+
public:
137+
default_sorter(Compare comp = Compare());
138+
139+
template<typename Group, typename Ptr>
140+
void operator()(Group g, Ptr first, Ptr last);
141+
142+
template<typename Group, typename T>
143+
T operator()(Group g, T val);
144+
};
145+
146+
template<typename T, radix_order Order = radix_order::ascending, unsigned int BitsPerPass = 4>
147+
class radix_sorter {
148+
public:
149+
radix_sorter(const std::bitset<sizeof(T) * CHAR_BIT> mask =
150+
std::bitset<sizeof(T) * CHAR_BIT> (std::numeric_limits<unsigned long long>::max()));
151+
152+
template<typename Group, typename Ptr>
153+
void operator()(Group g, Ptr first, Ptr last);
154+
155+
template<typename Group>
156+
T operator()(Group g, T val);
157+
};
158+
159+
}
160+
----
161+
162+
Table. Description of predefined Sorters.
163+
|===
164+
|Sorter|Description
165+
166+
|`template<typename Compare = std::less<>>
167+
default_sorter`
168+
|Use a default sorting method based on an implementation-defined heuristic
169+
using `Compare` as the binary comparison function object.
170+
171+
|`template<typename T, radix_order Order = radix_order::ascending, unsigned int BitsPerPass = 4>
172+
radix_sorter`
173+
|Use radix sort as a sorting method. `Order` specify the sorting order.
174+
Only arithmetic types as `T` can be passed to `radix_sorter`.
175+
`BitsPerPass` is a number of bits that values are split by.
176+
For example, if a sequence of `int32_t` is sorted using `BitsPerPass == 4` then one
177+
pass of the radix sort algorithm considers only 4 bits. The number of passes is `32/4=8`.
178+
|===
179+
180+
Table. Constructors of the `default_sorter` class.
181+
|===
182+
|Constructor|Description
183+
184+
|`default_sorter(Compare comp = Compare())`
185+
|Creates the `default_sorter` object using `comp`.
186+
|===
187+
188+
Table. Member functions of the `default_sorter` class.
189+
|===
190+
|Member function|Description
191+
192+
|`template<typename Group, typename Ptr>
193+
void operator()(Group g, Ptr first, Ptr last)`
194+
|Implements a default sorting algorithm to be called by the `joint_sort` algorithm.
195+
196+
_Complexity_: Let `N` be `last - first`. `O(N*log_2(N))` comparisons.
197+
198+
|`template<typename Group, typename T>
199+
T operator()(Group g, T val)`
200+
|Implements a default sorting algorithm to be called by the `sort_over_group` algorithm.
201+
202+
_Complexity_: Let `N` be the work group size. `O(N*log_2(N))` comparisons.
203+
|===
204+
205+
Table. Constructors of the `radix_sorter` class.
206+
|===
207+
|Constructor|Description
208+
209+
|`radix_sorter(const std::bitset<sizeof(T) * CHAR_BIT> mask = std::bitset<sizeof(T) * CHAR_BIT>
210+
(std::numeric_limits<unsigned long long>::max()));`
211+
|Creates the `radix_sorter` object to sort values considering only bits
212+
that corresponds to 1 in `mask`.
213+
|===
214+
215+
Table. Member functions of the `radix_sorter` class.
216+
|===
217+
|Member function|Description
218+
219+
|`template<typename Group, typename Ptr>
220+
void operator()(Group g, Ptr first, Ptr last)`
221+
|Implements the radix sort algorithm to be called by the `joint_sort` algorithm.
222+
223+
|`template<typename Group>
224+
T operator()(Group g, T val)`
225+
|Implements the radix sort algorithm to be called by the `sort_over_group` algorithm.
226+
|===
227+
228+
==== Sort
229+
The sort function from the {cpp} standard sorts elements with respect to
230+
the binary comparison function object.
231+
232+
SYCL provides two similar algorithms:
233+
234+
`joint_sort` uses the work-items in a group to execute the corresponding
235+
algorithm in parallel.
236+
237+
`sort_over_group` performs a sort over values held directly by the work-items
238+
in a group, and results returned to work-item `i` represent values that are in
239+
position `i` in the ordered range.
240+
241+
[source,c++]
242+
----
243+
namespace sycl::ext::oneapi {
244+
template <typename Group, typename Ptr>
245+
void joint_sort(Group g, Ptr first, Ptr last); // (1)
246+
247+
template <typename Group, typename Ptr, typename Compare>
248+
void joint_sort(Group g, Ptr first, Ptr last, Compare comp); // (2)
249+
250+
template <typename Group, typename Ptr, typename Sorter>
251+
void joint_sort(Group g, Ptr first, Ptr last, Sorter sorter); // (3)
252+
253+
template <typename Group, typename T>
254+
T sort_over_group(Group g, T val); // (4)
255+
256+
template <typename Group, typename T, typename Compare>
257+
T sort_over_group(Group g, T val, Compare comp); // (5)
258+
259+
template <typename Group, typename T, typename Sorter>
260+
T sort_over_group(Group g, T val, Sorter sorter); // (6)
261+
}
262+
----
263+
264+
_Constraints_: All functions are available only if `sycl::is_group_v<std::decay_t<Group>>`
265+
is true and `Sorter` is a SYCL Sorter.
266+
267+
_Preconditions_: `first`, `last` must be the same for all work-items in the group.
268+
269+
1._Effects_: Sort the elements in the range `[first, last)`.
270+
Elements are compared by `operator<`.
271+
272+
_Complexity_: Let `N` be `last - first`. `O(N*log_2(N))` comparisons.
273+
274+
2._Mandates_: `comp` must satisfy the requirements of `Compare` from
275+
the {cpp} standard.
276+
277+
_Effects_: Sort the elements in the range `[first, last)` with respect to the
278+
binary comparison function object `comp`.
279+
280+
_Complexity_: Let `N` be `last - first`. `O(N*log_2(N))` comparisons.
281+
282+
3._Effects_: Equivalent to: `sorter(g, first, last)`.
283+
284+
4._Returns_: The value returned on work-item `i` is the value in position `i`
285+
of the ordered range resulting from sorting `val` from all work-items in the
286+
`g` group. Elements are compared by `operator<`.
287+
For multi-dimensional groups, the order of work-items in the group is
288+
determined by their linear id.
289+
290+
_Complexity_: Let `N` be the work group size. `O(N*log_2(N))` comparisons.
291+
292+
5._Mandates_: `comp` must satisfy the requirements of `Compare` from
293+
the {cpp} standard.
294+
295+
_Returns_: The value returned on work-item `i` is the value in position `i`
296+
of the ordered range resulting from sorting `val` from all work-items in the
297+
`g` group with respect to the binary comparison function object `comp`.
298+
For multi-dimensional groups, the order of work-items in the group is
299+
determined by their linear id.
300+
301+
_Complexity_: Let `N` be the work group size. `O(N*log_2(N))` comparisons.
302+
303+
6._Effects_: Equivalent to: `return sorter(g, val)`.
304+
305+
== Issues
306+
307+
. Sort function can have interfaces with static arrays in private memory as well.
308+
The concern is that it can require changes for other group algortihms as well since sort
309+
basing on private memory is not very useful if other algorithms in the chain use local
310+
memory only.
311+
. It can be a separate proposal for key-value sorting basing on Projections.
312+
It needs to be investigated what is the response for that.
313+
. Sorter traits can be useful if there are Finder, Reducer or other objects
314+
will be added to the Spec to be used with other Group algorithms, e.g. find, reduce.
315+
316+
== Revision History
317+
318+
[cols="5,15,15,70"]
319+
[grid="rows"]
320+
[options="header"]
321+
|========================================
322+
|Rev|Date|Author|Changes
323+
|1|{docdate}|Andrey Fedorov|Initial public working draft
324+
|========================================

sycl/doc/extensions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ DPC++ extensions status:
3838
| [Level-Zero backend specification](LevelZeroBackend/LevelZeroBackend.md) | Supported | |
3939
| [ITT annotations support](ITTAnnotations/ITTAnnotations.rst) | Supported | |
4040
| [SYCL_EXT_ONEAPI_DEVICE_IF](DeviceIf/device_if.asciidoc) | Proposal | |
41+
| [SYCL_INTEL_group_sort](GroupAlgorithms/SYCL_INTEL_group_sort.asciidoc) | Proposal | |
4142

4243
Legend:
4344

0 commit comments

Comments
 (0)