-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][CUDA][HIP] Device info query for maximum registers per block and targetted exception on out-of-registers for CUDA. #9106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
steffenlarsen
merged 22 commits into
intel:sycl
from
GeorgeWeb:georgi/sycl-cuda-max-regs-query-and-throw-on-out-of-regs
May 12, 2023
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
8f867f4
[SYCL][CUDA][HIP] Add device info query for maximum registers per blo…
GeorgeWeb ca1ceed
[SYCL][CUDA] Improve error propagation and handling for out of launch…
GeorgeWeb 33943d9
Apply missed git clang-format.
GeorgeWeb 33adc69
Fix wrong value return type (size_t -> uint32_t) for the HIP query - …
GeorgeWeb 93a4fa1
Update the out of registers test for CUDA to use more registers and e…
GeorgeWeb 3e0a80d
Add the missing sum part of the vadd and sum kernel for the out of re…
GeorgeWeb f3964ef
Add new symbols to ABI dumps for Windows
GeorgeWeb 7977029
Merge remote-tracking branch 'upstream/sycl' into georgi/sycl-cuda-ma…
GeorgeWeb 54bef4f
Add extension documentation and Feature Macro to enable it.
GeorgeWeb 38ad146
Merge upstream/sycl and fix conflicts
GeorgeWeb 322599f
Use the DeviceImplPtr changes in device_info
GeorgeWeb 512bb6f
Bump PI header minor version and add change notes
GeorgeWeb 0885d18
Update sycl ABI symbols for Linux and Windows
GeorgeWeb aa749c4
Apply clang-format to PI_H
GeorgeWeb b797b97
Update sycl/source/detail/error_handling/error_handling.cpp
GeorgeWeb ec15e8f
Update the usage example in the extension documentation.
GeorgeWeb c499af7
Remove unnecessary section from the extension doc
GeorgeWeb 66683fc
Remove leftover subsection from extension doc.
GeorgeWeb b41ece9
Merge remote-tracking branch 'upstream/sycl' into georgi/sycl-cuda-ma…
GeorgeWeb 9b0dd50
Merge remote-tracking branch 'upstream/sycl' into georgi/sycl-cuda-ma…
GeorgeWeb aaef3c8
Update related e2e tests run commands.
GeorgeWeb 8173dbd
Update sycl/doc/extensions/experimental/sycl_ext_codeplay_max_registe…
GeorgeWeb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
49 changes: 49 additions & 0 deletions
49
...extensions/experimental/sycl_ext_codeplay_max_registers_per_work_group_query.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# sycl_ext_codeplay_max_registers_per_work_group_query | ||
|
||
## Notice | ||
|
||
This document describes an **experimental** API that applications can use to try | ||
out a new feature. Future versions of this API may change in ways that are | ||
incompatible with this experimental version. | ||
|
||
|
||
## Introduction | ||
|
||
This extension adds a new device information descriptor that provides the ability to query a device for the maximum number of registers available per work-group. | ||
|
||
OpenCL never offered such query due to the nature of being a very platform specific one - which is why it is also absent from SYCL. Now that SYCL supports back-ends where the register usage is a limiting resource factor of the possible maximum work-group size for a kernel, having the ability to query that limit is important for writing safe and portable code. | ||
|
||
## Feature test macro | ||
|
||
As encouraged by the SYCL specification, a feature-test macro, `SYCL_EXT_CODEPLAY_MAX_REGISTERS_PER_WORK_GROUP_QUERY`, is provided to determine whether this extension is implemented. | ||
|
||
## New device descriptor | ||
|
||
| Device descriptor | Return type | Description | | ||
| ------------------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| ext::codeplay::experimental::info::device::max_registers_per_work_group | unsigned int | Returns the maximum number of registers available for use per work-group based on the capability of the device. | | ||
|
||
### Note | ||
|
||
## Examples | ||
|
||
```c++ | ||
sycl::device gpu = sycl::device{sycl::gpu_selector_v}; | ||
std::cout << gpu.get_info<sycl::info::device::name>() << '\n'; | ||
|
||
#ifdef SYCL_EXT_CODEPLAY_MAX_REGISTERS_PER_WORK_GROUP_QUERY | ||
unsigned int registers_per_group = gpu.get_info<sycl::ext::codeplay::experimental::info::device::max_registers_per_work_group>(); | ||
std::cout << "Max registers per work-group: " << registers_per_group << '\n'; | ||
#endif | ||
``` | ||
|
||
Ouputs to the console: | ||
|
||
Executed using the CUDA back-end on NVIDIA. | ||
|
||
``` | ||
NVIDIA ... | ||
Max registers per work-group: 65536 | ||
``` | ||
|
||
- See: [CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 24 additions & 0 deletions
24
sycl/test-e2e/Basic/max_registers_per_work_group_query.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
// REQUIRES: cuda || hip | ||
// RUN: %{build} -o %t.out | ||
// RUN: %{run} %t.out | ||
|
||
#include <sycl/sycl.hpp> | ||
|
||
int main() { | ||
sycl::queue q; | ||
sycl::device dev = q.get_device(); | ||
|
||
#if !defined(SYCL_EXT_CODEPLAY_MAX_REGISTERS_PER_WORK_GROUP_QUERY) | ||
#error SYCL_EXT_CODEPLAY_MAX_REGISTERS_PER_WORK_GROUP_QUERY is not defined! | ||
#endif | ||
|
||
auto max_regs_per_wg = | ||
dev.get_info<sycl::ext::codeplay::experimental::info::device:: | ||
max_registers_per_work_group>(); | ||
std::cout << "Max register per work-group: " << max_regs_per_wg << std::endl; | ||
|
||
assert(max_regs_per_wg > 0); | ||
|
||
std::cout << "Passed!" << std::endl; | ||
return 0; | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this extension documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now added
sycl/doc/extensions/experimental/sycl_ext_codeplay_max_registers_per_work_group_query.md
. I have seen another extension fornum_regs
having just an inline documentation where it was added in the PI header, but thought I'd add a full doc forsycl_ext_codeplay_max_registers_per_work_group
with a MACRO.Does that sound okay to you? And if so, and I am missing something, please let me know. Thank you! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Yes, I am okay with that and I think I see the one you are referring to. That one might have slipped through the cracks, but the general rule is that public interfaces like that should either be specification or extension interfaces.
@AerialMantis - On a related note, are there any particular rhyme or reason to when we use
codeplay
in the extension features vs.intel
and/oroneapi
?