-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Fix handling of subgroup info queries #8859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,6 +54,18 @@ int main() { | |
const size_t prefWGSizeMult = krn.get_info< | ||
info::kernel_device_specific::preferred_work_group_size_multiple>(dev); | ||
assert(prefWGSizeMult > 0); | ||
const cl_uint maxSgSize = | ||
krn.get_info<info::kernel_device_specific::max_sub_group_size>(dev); | ||
assert(0 < maxSgSize && maxSgSize <= wgSize); | ||
const cl_uint compileSgSize = | ||
krn.get_info<info::kernel_device_specific::compile_sub_group_size>(dev); | ||
assert(compileSgSize <= maxSgSize); | ||
const cl_uint maxNumSg = | ||
krn.get_info<info::kernel_device_specific::max_num_sub_groups>(dev); | ||
assert(0 < maxNumSg); | ||
const cl_uint compileNumSg = | ||
krn.get_info<info::kernel_device_specific::compile_num_sub_groups>(dev); | ||
assert(compileNumSg <= maxNumSg); | ||
Comment on lines
+66
to
+68
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated to the PR, but that's a bit weird query. I don't think that we have a way to specify amount of sub-groups a kernel should have. Do we really need that query in SYCL spec? Tagging @gmlueck here to comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to ask @Pennycook what this query means. Is this supposed to correspond to
How would a SYCL application specify the number of subgroups at the source code level? You can specify the maximum number of work-items in a sub-group via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think so. But I have no idea what is the mechanism of specifying that value on kernels even for OpenCL. I think it is also undocumented in there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've always assumed that an implementation could derive this from a combination of Based on some quick searching, though, I'm not sure if that was the intent. In OpenCL-Docs#447, @bashbaug notes that there isn't a way to set this property in OpenCL unless you're providing SPIR-V (which defines a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My pedantic reading of the spec is that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you're right for SYCL, and that comes from the relaxations we put in for sub-group sizes. Table 34 in OpenCL says:
...so I think what I said holds for OpenCL. We deliberately made SYCL sub-groups very flexible to give implementations a lot of freedom, but we probably went too far. I think all implementations either: 1) have the behavior mandated by OpenCL; or 2) have similar guarantees applied to the inner-most dimension of the work-group. If we clarified in a future version of SYCL that only these two interpretations were legal, I think an implementation could still reason about the number of sub-groups given the work-group size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@gmlueck, can you please clarify why the annotation only guarantees a maximum sub-group size? I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I created a Khronos issue to track this spec clarification: https://gitlab.khronos.org/sycl/Specification/-/issues/651 (Requires Khronos access.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When the work-group size is not evenly divisible by the sub-group size, some sub-group(s) must have a smaller size. The SYCL spec does not currently provide any requirements. With today's wording, an implementation could make several (or all) sub-groups with a smaller size. There isn't even a requirement in the spec about the size of sub-groups when the work-group size is evenly divisible by the sub-group size. |
||
|
||
try { | ||
krn.get_info<sycl::info::kernel_device_specific::global_work_size>(dev); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought that there would be a check for OpenCL version supported by the device before calling
clGetKernelSubGroupInfo
at all. However, this approach seems simpler and more performant