-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Fix handling of subgroup info queries #8859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update `opencl` PI plugin info queries to prevent crashes when the device does not support subgroups. Signed-off-by: Michael Aziz <michael.aziz@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
const cl_uint compileNumSg = | ||
krn.get_info<info::kernel_device_specific::compile_num_sub_groups>(dev); | ||
assert(compileNumSg <= maxNumSg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the PR, but that's a bit weird query. I don't think that we have a way to specify amount of sub-groups a kernel should have. Do we really need that query in SYCL spec? Tagging @gmlueck here to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to ask @Pennycook what this query means. Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS
in OpenCL?
Returns the number of sub-groups per work-group specified in the kernel source or IL. If the sub-group count is not specified then 0 is returned.
How would a SYCL application specify the number of subgroups at the source code level? You can specify the maximum number of work-items in a sub-group via [[sycl::reqd_sub_group_size]]
, but that's not the same as the number of sub-groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?
I think so. But I have no idea what is the mechanism of specifying that value on kernels even for OpenCL. I think it is also undocumented in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always assumed that an implementation could derive this from a combination of [[sycl::reqd_sub_group_size]]
and [[sycl::reqd_work_group_size]]
-- if both are specified, then the number of sub-groups has also been specified. Even though devices are free to choose how work-groups are divided into sub-groups, I think that once a kernel has been compiled for a specific device the number of sub-groups should be known.
Based on some quick searching, though, I'm not sure if that was the intent. In OpenCL-Docs#447, @bashbaug notes that there isn't a way to set this property in OpenCL unless you're providing SPIR-V (which defines a SubgroupsPerWorkgroup
ExecutionMode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My pedantic reading of the spec is that [[sycl::reqd_sub_group_size]]
only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of [[sycl::reqd_sub_group_size]]
and [[sycl::reqd_work_group_size]]
does not necessarily specify a particular number of sub-groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right for SYCL, and that comes from the relaxations we put in for sub-group sizes.
Table 34 in OpenCL says:
All sub-groups must be the same size, while the last subgroup in any work-group (i.e. the subgroup with the maximum index) could be the same or smaller size.
...so I think what I said holds for OpenCL.
We deliberately made SYCL sub-groups very flexible to give implementations a lot of freedom, but we probably went too far. I think all implementations either: 1) have the behavior mandated by OpenCL; or 2) have similar guarantees applied to the inner-most dimension of the work-group. If we clarified in a future version of SYCL that only these two interpretations were legal, I think an implementation could still reason about the number of sub-groups given the work-group size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My pedantic reading of the spec is that
[[sycl::reqd_sub_group_size]]
only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of[[sycl::reqd_sub_group_size]]
and[[sycl::reqd_work_group_size]]
does not necessarily specify a particular number of sub-groups.
@gmlueck, can you please clarify why the annotation only guarantees a maximum sub-group size? I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a Khronos issue to track this spec clarification:
https://gitlab.khronos.org/sycl/Specification/-/issues/651
(Requires Khronos access.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.
When the work-group size is not evenly divisible by the sub-group size, some sub-group(s) must have a smaller size. The SYCL spec does not currently provide any requirements. With today's wording, an implementation could make several (or all) sub-groups with a smaller size. There isn't even a requirement in the spec about the size of sub-groups when the work-group size is evenly divisible by the sub-group size.
@@ -1313,6 +1313,32 @@ pi_result piKernelGetSubGroupInfo(pi_kernel kernel, pi_device device, | |||
cast<cl_kernel_sub_group_info>(param_name), input_value_size, input_value, | |||
sizeof(size_t), &ret_val, param_value_size_ret)); | |||
|
|||
if (ret_err == CL_INVALID_OPERATION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought that there would be a check for OpenCL version supported by the device before calling clGetKernelSubGroupInfo
at all. However, this approach seems simpler and more performant
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Test failures (unrelated to this change):
|
This reverts commit 2408035.
Update
opencl
PI plugin info queries to prevent crashes when the device does not support subgroups.