Skip to content

[SYCL-PTX] Add warp-reduce path in sub-group reduce #3949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

steffenlarsen
Copy link
Contributor

PTX introduces new warp reduction instructions with sm_80. These changes adds a path in select libclc subgroup collective functions for using these warp reduction instructions when available.

As an effect of these changes, future changes to libclc targeting nvptx can use __nvvm_reflect to differentiate between SM versions, allowing architecture-specific paths that will be determined after linking device code with libclc.

PTX introduces new warp reduction instructions with sm_80. These changes
adds a path in select libclc subgroup collective functions for using
these warp reduction instructions when available.

As an effect of these changes, future changes to libclc targetting nvptx
can use __nvvm_reflect to differentiate between SM versions, allowing
architecture-specific paths that will be determined after linking device
code with libclc.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
@steffenlarsen steffenlarsen requested a review from bader as a code owner June 17, 2021 11:31
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
@bader bader added the cuda CUDA back-end label Jun 21, 2021
@bader
Copy link
Contributor

bader commented Jun 21, 2021

/summary:run

@bader bader merged commit 78411c4 into intel:sycl Jun 21, 2021
@bader bader added the libclc libclc project related issues label Jun 21, 2021
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Jun 24, 2021
* upstream/sycl: (489 commits)
  [SYCL][NFC] Lower overhead of making plugin calls (intel#3982)
  [SYCL][NFC] Use default macro initialization where applicable (intel#3979)
  [SYCL] Enable SPV_INTEL_fpga_invocation_pipelining_attributes extension (intel#3864)
  [SYCL] Disable reassociate pass to reduce register pressure (intel#3615)
  [Driver][SYCL][FPGA] Restrict -O0 for FPGA with hardware (intel#3966)
  [SYCL][NFC] Fix warnings coming out of SYCL headers. (intel#3978)
  [SYCL] Fix bugs with recursion in SYCL kernel (intel#3958)
  [SYCL][LevelZero] Add support to detect host->device and device->host transfers for USM (intel#3975)
  [SYCL] Enable native FP atomics by default (intel#3869)
  [sycl-post-link] Avoid copying from nullptr (intel#3963)
  [SYCL-PTX] Add warp-reduce path in sub-group reduce (intel#3949)
  [BuildBot] Uplift CPU/FPGAEMU RT version for CI Process (intel#3946)
  Fix handling of complex constant expressions
  Handle OpSpecConstantOp with CompositeExtract and CompositeInsert
  Handle OpSpecConstantOp with VectorShuffle
  [FuncSpec] Don't specialise functions with NoDuplicate instructions.
  [mlir][linalg] Support low padding in subtensor(pad_tensor) lowering
  [gn build] Port 208332d
  [AMDGPU] Add Optimize VGPR LiveRange Pass.
  [mlir][Linalg] NFC - Drop unused variable definition.
  ...
@steffenlarsen steffenlarsen deleted the steffen/libclc-subgroup-reduce-sm80-fast branch December 6, 2023 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end libclc libclc project related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants