Skip to content

[SYCL] Fix handling of subgroup info queries #8859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions sycl/plugins/opencl/pi_opencl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1313,6 +1313,32 @@ pi_result piKernelGetSubGroupInfo(pi_kernel kernel, pi_device device,
cast<cl_kernel_sub_group_info>(param_name), input_value_size, input_value,
sizeof(size_t), &ret_val, param_value_size_ret));

if (ret_err == CL_INVALID_OPERATION) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought that there would be a check for OpenCL version supported by the device before calling clGetKernelSubGroupInfo at all. However, this approach seems simpler and more performant

// clGetKernelSubGroupInfo returns CL_INVALID_OPERATION if the device does
// not support subgroups.

if (param_name == PI_KERNEL_MAX_NUM_SUB_GROUPS) {
ret_val = 1; // Minimum required by SYCL 2020 spec
ret_err = CL_SUCCESS;
} else if (param_name == PI_KERNEL_COMPILE_NUM_SUB_GROUPS) {
ret_val = 0; // Not specified by kernel
ret_err = CL_SUCCESS;
} else if (param_name == PI_KERNEL_MAX_SUB_GROUP_SIZE) {
// Return the maximum work group size for the kernel
size_t kernel_work_group_size = 0;
pi_result pi_ret_err = piKernelGetGroupInfo(
kernel, device, PI_KERNEL_GROUP_INFO_WORK_GROUP_SIZE, sizeof(size_t),
&kernel_work_group_size, nullptr);
if (pi_ret_err != PI_SUCCESS)
return pi_ret_err;
ret_val = kernel_work_group_size;
ret_err = CL_SUCCESS;
} else if (param_name == PI_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL) {
ret_val = 0; // Not specified by kernel
ret_err = CL_SUCCESS;
}
}

if (ret_err != CL_SUCCESS)
return cast<pi_result>(ret_err);

Expand Down
12 changes: 12 additions & 0 deletions sycl/test-e2e/Basic/kernel_info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,18 @@ int main() {
const size_t prefWGSizeMult = krn.get_info<
info::kernel_device_specific::preferred_work_group_size_multiple>(dev);
assert(prefWGSizeMult > 0);
const cl_uint maxSgSize =
krn.get_info<info::kernel_device_specific::max_sub_group_size>(dev);
assert(0 < maxSgSize && maxSgSize <= wgSize);
const cl_uint compileSgSize =
krn.get_info<info::kernel_device_specific::compile_sub_group_size>(dev);
assert(compileSgSize <= maxSgSize);
const cl_uint maxNumSg =
krn.get_info<info::kernel_device_specific::max_num_sub_groups>(dev);
assert(0 < maxNumSg);
const cl_uint compileNumSg =
krn.get_info<info::kernel_device_specific::compile_num_sub_groups>(dev);
assert(compileNumSg <= maxNumSg);
Comment on lines +66 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the PR, but that's a bit weird query. I don't think that we have a way to specify amount of sub-groups a kernel should have. Do we really need that query in SYCL spec? Tagging @gmlueck here to comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to ask @Pennycook what this query means. Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?

Returns the number of sub-groups per work-group specified in the kernel source or IL. If the sub-group count is not specified then 0 is returned.

How would a SYCL application specify the number of subgroups at the source code level? You can specify the maximum number of work-items in a sub-group via [[sycl::reqd_sub_group_size]], but that's not the same as the number of sub-groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?

I think so. But I have no idea what is the mechanism of specifying that value on kernels even for OpenCL. I think it is also undocumented in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always assumed that an implementation could derive this from a combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] -- if both are specified, then the number of sub-groups has also been specified. Even though devices are free to choose how work-groups are divided into sub-groups, I think that once a kernel has been compiled for a specific device the number of sub-groups should be known.

Based on some quick searching, though, I'm not sure if that was the intent. In OpenCL-Docs#447, @bashbaug notes that there isn't a way to set this property in OpenCL unless you're providing SPIR-V (which defines a SubgroupsPerWorkgroup ExecutionMode).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pedantic reading of the spec is that [[sycl::reqd_sub_group_size]] only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] does not necessarily specify a particular number of sub-groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right for SYCL, and that comes from the relaxations we put in for sub-group sizes.

Table 34 in OpenCL says:

All sub-groups must be the same size, while the last subgroup in any work-group (i.e. the subgroup with the maximum index) could be the same or smaller size.

...so I think what I said holds for OpenCL.

We deliberately made SYCL sub-groups very flexible to give implementations a lot of freedom, but we probably went too far. I think all implementations either: 1) have the behavior mandated by OpenCL; or 2) have similar guarantees applied to the inner-most dimension of the work-group. If we clarified in a future version of SYCL that only these two interpretations were legal, I think an implementation could still reason about the number of sub-groups given the work-group size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pedantic reading of the spec is that [[sycl::reqd_sub_group_size]] only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] does not necessarily specify a particular number of sub-groups.

@gmlueck, can you please clarify why the annotation only guarantees a maximum sub-group size? I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a Khronos issue to track this spec clarification:

https://gitlab.khronos.org/sycl/Specification/-/issues/651

(Requires Khronos access.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.

When the work-group size is not evenly divisible by the sub-group size, some sub-group(s) must have a smaller size. The SYCL spec does not currently provide any requirements. With today's wording, an implementation could make several (or all) sub-groups with a smaller size. There isn't even a requirement in the spec about the size of sub-groups when the work-group size is evenly divisible by the sub-group size.


try {
krn.get_info<sycl::info::kernel_device_specific::global_work_size>(dev);
Expand Down