-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][CUDA] Handle large Y/Z range dimensions. #7968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cks per grid limits.
Please, add
https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md#pull-request I also recommend linking this PR to the issue #7854 to the issue automatically when PR is merged. https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue |
/verify with intel/llvm-test-suite#1500 |
… size. - Improve variable namings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 🚀
@steffenlarsen, thanks, it's updated. |
Provides a test for intel/llvm#7968
…te#1500) Provides a test for intel#7968
The dimensions passed to sycl::range, determine the blocks per grid and threads per blocks. Currently, calculation of thread per blocks only performed for the x dimension. This means the blocks per grid for y and z dimensions passed to cuLaunchKernel, directly come from the sycl::range arguments. This can result in an error returned on calling cuLaunchKernel, when those parameters for y and z dimensions are larger than 65535.
This PR offers a simple tuning of thread per block for larger (over 65535) values of Y and Z dimensions to make the associated blocks per grid within the allowed range.