SYCL Academy

This section shows the use of some SYCL features in order to perform a vector reduce operation.

Naive SYCL reduction

reduce_naive.cpp shows a reduction performed without using atomics or any other SYCL feature. It uses two separate kernels to achieve device synchronization between reductions step 1 and 2.

SYCL Atomics

reduce_atomic.cpp shows a reduction operation using local memory and global memory atomics. Workgroups perform a local reduction using local atomics and the result is added together using a single global memory sycl::atomic_ref.

SYCL Group Algorithms

reduce_group_algorithm.cpp replaces the local memory atomic add with a call to sycl::reduce_over_group, which can be expected to perform a similar operation to the device function workgroup_reduce_local_mem as seen in reduce_naive.cpp.

SYCL Reductions

reduce_sycl_reduction.cpp replaces the entire logic of the reduction with a SYCL reducer object. The reduction is a parameter of the parallel_for invocation, and there is another reducer object used as a parameter to the kernel lambda.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SYCL Academy

Naive SYCL reduction

SYCL Atomics

SYCL Group Algorithms

SYCL Reductions

Files

README.md

Latest commit

History

README.md

File metadata and controls

SYCL Academy

Naive SYCL reduction

SYCL Atomics

SYCL Group Algorithms

SYCL Reductions