[SYCL-PTX] Add warp-reduce path in sub-group reduce #3949

steffenlarsen · 2021-06-17T11:31:33Z

PTX introduces new warp reduction instructions with sm_80. These changes adds a path in select libclc subgroup collective functions for using these warp reduction instructions when available.

As an effect of these changes, future changes to libclc targeting nvptx can use __nvvm_reflect to differentiate between SM versions, allowing architecture-specific paths that will be determined after linking device code with libclc.

PTX introduces new warp reduction instructions with sm_80. These changes adds a path in select libclc subgroup collective functions for using these warp reduction instructions when available. As an effect of these changes, future changes to libclc targetting nvptx can use __nvvm_reflect to differentiate between SM versions, allowing architecture-specific paths that will be determined after linking device code with libclc. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

bader · 2021-06-21T08:55:22Z

/summary:run

* upstream/sycl: (489 commits) [SYCL][NFC] Lower overhead of making plugin calls (intel#3982) [SYCL][NFC] Use default macro initialization where applicable (intel#3979) [SYCL] Enable SPV_INTEL_fpga_invocation_pipelining_attributes extension (intel#3864) [SYCL] Disable reassociate pass to reduce register pressure (intel#3615) [Driver][SYCL][FPGA] Restrict -O0 for FPGA with hardware (intel#3966) [SYCL][NFC] Fix warnings coming out of SYCL headers. (intel#3978) [SYCL] Fix bugs with recursion in SYCL kernel (intel#3958) [SYCL][LevelZero] Add support to detect host->device and device->host transfers for USM (intel#3975) [SYCL] Enable native FP atomics by default (intel#3869) [sycl-post-link] Avoid copying from nullptr (intel#3963) [SYCL-PTX] Add warp-reduce path in sub-group reduce (intel#3949) [BuildBot] Uplift CPU/FPGAEMU RT version for CI Process (intel#3946) Fix handling of complex constant expressions Handle OpSpecConstantOp with CompositeExtract and CompositeInsert Handle OpSpecConstantOp with VectorShuffle [FuncSpec] Don't specialise functions with NoDuplicate instructions. [mlir][linalg] Support low padding in subtensor(pad_tensor) lowering [gn build] Port 208332d [AMDGPU] Add Optimize VGPR LiveRange Pass. [mlir][Linalg] NFC - Drop unused variable definition. ...

steffenlarsen requested a review from bader as a code owner June 17, 2021 11:31

Fix formatting

ef7cbdd

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

bader added the cuda CUDA back-end label Jun 21, 2021

bader approved these changes Jun 21, 2021

View reviewed changes

bader merged commit 78411c4 into intel:sycl Jun 21, 2021

bader added the libclc libclc project related issues label Jun 21, 2021

steffenlarsen deleted the steffen/libclc-subgroup-reduce-sm80-fast branch December 6, 2023 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL-PTX] Add warp-reduce path in sub-group reduce #3949

[SYCL-PTX] Add warp-reduce path in sub-group reduce #3949

Uh oh!

steffenlarsen commented Jun 17, 2021

Uh oh!

bader commented Jun 21, 2021

Uh oh!

Uh oh!

[SYCL-PTX] Add warp-reduce path in sub-group reduce #3949

[SYCL-PTX] Add warp-reduce path in sub-group reduce #3949

Uh oh!

Conversation

steffenlarsen commented Jun 17, 2021

Uh oh!

bader commented Jun 21, 2021

Uh oh!

Uh oh!