Skip to content

[SYCL][CUDA] Don't enqueue an event wait on same CUDA stream #5099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 8, 2021

Conversation

joeatodd
Copy link
Contributor

@joeatodd joeatodd commented Dec 7, 2021

Because CUDA streams are in order, a cuStreamWaitEvent on a CUstream with which the CUevent is already associated is semantically a no op. This patch removes the small runtime overhead associated with submitting the unnecessary wait.

This is a no-op as CUDA streams are 'in order'. This patch avoids
small overhead associated with submitting the wait.
@joeatodd joeatodd requested a review from a team as a code owner December 7, 2021 15:08
@joeatodd joeatodd requested a review from s-kanaev December 7, 2021 15:08
@s-kanaev
Copy link
Contributor

s-kanaev commented Dec 7, 2021

@joeatodd , I guess, something is missing in title of the patch?

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader
Copy link
Contributor

bader commented Dec 8, 2021

@joeatodd , I guess, something is missing in title of the patch?

@s-kanaev, could you specify what exactly is missing, please?

@s-kanaev
Copy link
Contributor

s-kanaev commented Dec 8, 2021

Oh, it seems like I've misread the title.
The title is fine to me.

@bader bader merged commit 35729a7 into intel:sycl Dec 8, 2021
@joeatodd joeatodd deleted the no-op-wait branch December 9, 2021 19:01
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Dec 11, 2021
* upstream/sycl: (725 commits)
  [SYCL] Translate ZE_RESULT_ERROR_INVALID_ARGUMENT error code from L0 RT (intel#5122)
  [SYCL][L0][Plugin] Call ZeCommandQueueCreate on demand (intel#5109)
  [SYCL] Switch to using blocking USM free for OpenCL GPU (intel#4928)
  [CI] Disable pack and upload steps (intel#5119)
  [SYCL] Disable submission of AssertInfoCopier for FPGA (intel#4780)
  [SYCL][SPIRV] Implement islessgreater with FOrdNotEqual instead (intel#5076)
  [SYCL] Fix typo in the name of the host-visible pool (intel#5073)
  [SYCL] Only call shutdown when DLL is being unloaded, not when process is terminating (intel#4983)
  [SYCL][CUDA][PI] Fix infinite loop when parallel_for range exceeds INT_MAX (intel#5095)
  [SYCL] Translate out-of-memory error codes from L0 RT (intel#5107)
  [SYCL] Fix a few warnings during build scripts configuration (intel#5082)
  [SYCL] Fix amdgpu openmp test (intel#5103)
  [SYCL] [FPGA] Create experimental headers for FPGA latency control (intel#5066)
  [SYCL][CUDA] Don't enqueue an event wait on same CUDA stream (intel#5099)
  Remove PR disable template (intel#5102)
  [BuildBot]Uplift CPU/FPGAEMU RT version (intel#5078)
  [SYCL] Fix the test to not depend on a specific line. (intel#5092)
  [CI] Provide libclc targets to build and test (intel#5091)
  Fix build of `check-llvm-spirv` target after 8f8001a
  Force opt to use new pass manager in pr52289 test after c34d157
  ...
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Dec 12, 2021
* upstream/sycl:
  [CI] Add container users to video group (intel#5101)
  [CI] More typo fixes in Nightly build (intel#5088)
  Revert "[CI] Disable pack and upload steps (intel#5119)" (intel#5125)
  [SYCL] Translate ZE_RESULT_ERROR_INVALID_ARGUMENT error code from L0 RT (intel#5122)
  [SYCL][L0][Plugin] Call ZeCommandQueueCreate on demand (intel#5109)
  [SYCL] Switch to using blocking USM free for OpenCL GPU (intel#4928)
  [CI] Disable pack and upload steps (intel#5119)
  [SYCL] Disable submission of AssertInfoCopier for FPGA (intel#4780)
  [SYCL][SPIRV] Implement islessgreater with FOrdNotEqual instead (intel#5076)
  [SYCL] Fix typo in the name of the host-visible pool (intel#5073)
  [SYCL] Only call shutdown when DLL is being unloaded, not when process is terminating (intel#4983)
  [SYCL][CUDA][PI] Fix infinite loop when parallel_for range exceeds INT_MAX (intel#5095)
  [SYCL] Translate out-of-memory error codes from L0 RT (intel#5107)
  [SYCL] Fix a few warnings during build scripts configuration (intel#5082)
  [SYCL] Fix amdgpu openmp test (intel#5103)
  [SYCL] [FPGA] Create experimental headers for FPGA latency control (intel#5066)
  [SYCL][CUDA] Don't enqueue an event wait on same CUDA stream (intel#5099)
  Remove PR disable template (intel#5102)
  [BuildBot]Uplift CPU/FPGAEMU RT version (intel#5078)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants