Skip to content

[SYCL] Fix handling of subgroup info queries #8859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 4, 2023

Conversation

0x12CC
Copy link
Contributor

@0x12CC 0x12CC commented Mar 29, 2023

Update opencl PI plugin info queries to prevent crashes when the device does not support subgroups.

Update `opencl` PI plugin info queries to prevent crashes when the
device does not support subgroups.

Signed-off-by: Michael Aziz <michael.aziz@intel.com>
@0x12CC 0x12CC requested a review from a team as a code owner March 29, 2023 15:22
@0x12CC 0x12CC requested a review from againull March 29, 2023 15:22
@AlexeySachkov AlexeySachkov requested a review from a team March 29, 2023 17:38
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

Comment on lines +66 to +68
const cl_uint compileNumSg =
krn.get_info<info::kernel_device_specific::compile_num_sub_groups>(dev);
assert(compileNumSg <= maxNumSg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the PR, but that's a bit weird query. I don't think that we have a way to specify amount of sub-groups a kernel should have. Do we really need that query in SYCL spec? Tagging @gmlueck here to comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to ask @Pennycook what this query means. Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?

Returns the number of sub-groups per work-group specified in the kernel source or IL. If the sub-group count is not specified then 0 is returned.

How would a SYCL application specify the number of subgroups at the source code level? You can specify the maximum number of work-items in a sub-group via [[sycl::reqd_sub_group_size]], but that's not the same as the number of sub-groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?

I think so. But I have no idea what is the mechanism of specifying that value on kernels even for OpenCL. I think it is also undocumented in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always assumed that an implementation could derive this from a combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] -- if both are specified, then the number of sub-groups has also been specified. Even though devices are free to choose how work-groups are divided into sub-groups, I think that once a kernel has been compiled for a specific device the number of sub-groups should be known.

Based on some quick searching, though, I'm not sure if that was the intent. In OpenCL-Docs#447, @bashbaug notes that there isn't a way to set this property in OpenCL unless you're providing SPIR-V (which defines a SubgroupsPerWorkgroup ExecutionMode).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pedantic reading of the spec is that [[sycl::reqd_sub_group_size]] only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] does not necessarily specify a particular number of sub-groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right for SYCL, and that comes from the relaxations we put in for sub-group sizes.

Table 34 in OpenCL says:

All sub-groups must be the same size, while the last subgroup in any work-group (i.e. the subgroup with the maximum index) could be the same or smaller size.

...so I think what I said holds for OpenCL.

We deliberately made SYCL sub-groups very flexible to give implementations a lot of freedom, but we probably went too far. I think all implementations either: 1) have the behavior mandated by OpenCL; or 2) have similar guarantees applied to the inner-most dimension of the work-group. If we clarified in a future version of SYCL that only these two interpretations were legal, I think an implementation could still reason about the number of sub-groups given the work-group size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pedantic reading of the spec is that [[sycl::reqd_sub_group_size]] only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] does not necessarily specify a particular number of sub-groups.

@gmlueck, can you please clarify why the annotation only guarantees a maximum sub-group size? I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a Khronos issue to track this spec clarification:

https://gitlab.khronos.org/sycl/Specification/-/issues/651

(Requires Khronos access.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.

When the work-group size is not evenly divisible by the sub-group size, some sub-group(s) must have a smaller size. The SYCL spec does not currently provide any requirements. With today's wording, an implementation could make several (or all) sub-groups with a smaller size. There isn't even a requirement in the spec about the size of sub-groups when the work-group size is evenly divisible by the sub-group size.

@@ -1313,6 +1313,32 @@ pi_result piKernelGetSubGroupInfo(pi_kernel kernel, pi_device device,
cast<cl_kernel_sub_group_info>(param_name), input_value_size, input_value,
sizeof(size_t), &ret_val, param_value_size_ret));

if (ret_err == CL_INVALID_OPERATION) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought that there would be a check for OpenCL version supported by the device before calling clGetKernelSubGroupInfo at all. However, this approach seems simpler and more performant

Signed-off-by: Michael Aziz <michael.aziz@intel.com>
@0x12CC 0x12CC temporarily deployed to aws March 29, 2023 23:41 — with GitHub Actions Inactive
@0x12CC 0x12CC temporarily deployed to aws March 30, 2023 03:29 — with GitHub Actions Inactive
@0x12CC
Copy link
Contributor Author

0x12CC commented Mar 30, 2023

Test failures (unrelated to this change):

@bader bader requested a review from AlexeySachkov April 1, 2023 02:42
@AlexeySachkov AlexeySachkov merged commit 2408035 into intel:sycl Apr 4, 2023
@0x12CC 0x12CC deleted the subgroup_info branch April 4, 2023 15:16
dm-vodopyanov added a commit that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants