Skip to content

[SYCL][CUDA] Fix context scope in kernel launch #4606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 21, 2021

Conversation

npmiller
Copy link
Contributor

The guessLocalWorkSize function uses the CUDA API so it needs an
active context, and there was no active ScopedContext when it was
called which may cause issue.

This fixes #2777

The `guessLocalWorkSize` function uses the CUDA API so it needs an
active context, and there was no active `ScopedContext` when it was
called which may cause issue.

This fixes intel#2777
@npmiller npmiller requested a review from a team as a code owner September 20, 2021 16:13
@npmiller npmiller requested a review from s-kanaev September 20, 2021 16:13
@bader bader added cuda CUDA back-end runtime Runtime library related issue labels Sep 20, 2021
steffenlarsen
steffenlarsen previously approved these changes Sep 20, 2021
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Beware that #4401 also implements this by putting the ScopedContext inside guessLocalWorkSize. I do not have a preference to whether or not it should be at API top-level or inside guessLocalWorkSize, but having it both places is redundant. Ping @JackAKirk for awareness.

@bader bader requested a review from JackAKirk September 21, 2021 06:13
@JackAKirk
Copy link
Contributor

Looks fine to me. You might want to wrap the ScopedContext into a try/catch block in case it throws PI_INVALID_CONTEXT, that can then be returned as pi_result in cuda_piEnqueueKernelLaunch; since this appears to be the standard practice in the file.

JackAKirk
JackAKirk previously approved these changes Sep 21, 2021
@npmiller
Copy link
Contributor Author

LGTM.

Beware that #4401 also implements this by putting the ScopedContext inside guessLocalWorkSize. I do not have a preference to whether or not it should be at API top-level or inside guessLocalWorkSize, but having it both places is redundant. Ping @JackAKirk for awareness.

Good catch, I think we should try to keep the ScopedContext at the top level as much as possible, because since they don't restore context, nested ScopedContext could lead to trouble, although I'm not sure we could have a scenario where nested ScopedContext would use different contexts but still.

Looks fine to me. You might want to wrap the ScopedContext into a try/catch block in case it throws PI_INVALID_CONTEXT, that can then be returned as pi_result in cuda_piEnqueueKernelLaunch; since this appears to be the standard practice in the file.

Good point, I'll extend the try/catch block.

@bader bader merged commit 0d3cc99 into intel:sycl Sep 21, 2021
JackAKirk pushed a commit to JackAKirk/llvm that referenced this pull request Sep 23, 2021
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Sep 24, 2021
* upstream/sycl: (2344 commits)
  [ESIMD] Rename slm_load4/slm_store4 to slm_load_rgba/slm_store_rgba (intel#4158)
  [SYCL] Avoid re-computing group_range in nd_item::get_group_range() (intel#4621)
  [clang-offload-extract] Ignore zero padding in .tgting section (intel#4622)
  [Driver][SYCL] Fix -fsycl-help output when redirected (intel#4619)
  [Driver][SYCL][FPGA] Do not unbundle aoco as an archive for hardware (intel#4477)
  [Driver][SYCL] Fix offload-bundler and offload-deps triples (intel#4616)
  [SYCL] Fix bit_cast for half type (intel#4603)
  [SYCL] Fix a typo in accessor::get_range method (intel#4556)
  [SYCL] Detach allocas from their dependencies regardless of linked alloca presence (intel#4573)
  [SYCL][L0] Make sure that we only query/sync host-visible events from the host. (intel#4613)
  Fix tests with wrong alias metadata
  [Driver][SYCL] Fixup arguments to llc call for PIC and code-model (intel#4614)
  [SYCL][L0] Add ownership control for LeveL-Zero kernel_bundle interop. (intel#4576)
  [SYCL][Driver] Expose llvm-foreach --jobs functionality through a driver option (intel#4543)
  [SYCL] Prevent stream buffer leak on constructor exception (intel#4594)
  [ESIMD] Replace mask_type_t with simd_mask to represent Gen predicates. (intel#4230)
  Fix for a bunch of fixed point integer SPIR-V instructions (intel#1213)
  Amend SingleElementVectorINTEL decoration use cases according to spec update (intel#1192)
  Enforce UserSemantic decoration if no FPGA decorations found
  [SYCL][CUDA] Fix context scope in kernel launch (intel#4606)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end runtime Runtime library related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SYCL][CUDA] threaded kernel dispatch does not work properly
4 participants