[SYCL][DOC][Group sort] Let memory_required methods accept memory_scope::device as a parameter #17238

andreyfe1 · 2025-02-28T10:43:22Z

It was found that some hardware backends can return the memory size for the whole device much less than just simple num_groups * memory_for_each_work_group.
So, after discussion it was suggested to add a possibility to have memory_scope::device for memory_required of default sorters.

Signed-off-by: Fedorov, Andrey andrey.fedorov@intel.com

andreyfe1 · 2025-02-28T10:46:50Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_group_sort.asciidoc

    [=](sycl::nd_item<1> id) {
      auto ptr = acc.get_pointer() + id.get_group(0) * n;

      my_sycl::joint_sort(
        // create group helper using deduction guides
        my_sycl::group_with_scratchpad(
          id.get_group(),
-          sycl::span{scratch.get_pointer(), temp_memory_size}
+          sycl::span{temp, temp_memory_size}


For reviewers.
We pass only a global pointer to the temporary memory. We don't need to split it into work-groups, e.g. temp + id.get_group(0) * n. Underlying backend has to understand how to split the temporary memory between work-groups

andreyfe1 · 2025-02-28T11:18:42Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_group_sort.asciidoc

    [=](sycl::nd_item<1> id) {
      auto ptr = acc.get_pointer() + id.get_group(0) * n;

      my_sycl::joint_sort(
        // create group helper using deduction guides
        my_sycl::group_with_scratchpad(
          id.get_group(),
-          sycl::span{scratch.get_pointer(), temp_memory_size}
+          sycl::span{temp, temp_memory_size}


I have a concern regarding how joint_sort can distinguish if the temporary memory size was given using memory_scope::device or memory_scope::work_group.
So, if some implementation defined functionality can be developed to grab this information, it's fine. If not, we need to add some extra parameters for sorting functions calling within the kernel

andreyfe1 · 2025-03-13T12:57:15Z

Ping @intel/dpcpp-specification-reviewers

Pennycook · 2025-03-14T10:43:54Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_group_sort.asciidoc

+If `scope == sycl::memory_scope::device`,
+`range_size` is the size of the global range for `sycl::nd_range`
+that was used to run the kernel;
+if `scope == sycl::memory_scope::work_group`,


I think this comment really applies to the whole pull request, but I'm commenting here so it can be threaded.

The way that I thought this worked was that if a developer was using a work-group to sort one array per work-group (i.e., the first argument to joint_sort is a sycl::group) then they should use memory_scope::work_group, and that if a developer was using a sub-group to sort one array per sub-group (i.e., the first argument to joint_sort is a sycl::sub_group) then they should use memory_scope::sub_group.

The logical extension of that would be that if a developer wants to use the entire device to sort a single array, then the first argument to sycl::joint_sort should be a root_group and they should use memory_scope::device.

But what you are introducing here seems to be a way for developers to query the total amount of memory required by the device when you run independent sort calls within each work-group or sub-group. I can see the appeal of doing that (because the device might be able to re-use memory across work-groups) but if I'm right about what you're trying to do here than I don't think it's a good idea to try and retrofit it into your existing interface.

I think the fact that you're changing what the memory_scope argument means is what leads to your concern in https://github.com/intel/llvm/pull/17238/files#r1975245222 -- there is no way for joint_sort to reason about how the user constructed the scratchpad. I think if you want to expose this functionality, you need to redesign the scratchpad in a way that allows the user to express exactly what they want to use the scratchpad for, and that allows the implementation to track that information until the point where the scratchpad is actually used.

@Pennycook, thanks for your comment! Actually there is no direct connection between where to execute the algo and how temporary memory is allocated. I also see it in SYCL in general, e.g. users can call joint_reduce on data allocated with local or global memory: memory allocation and execution are separated. So, it was just a coincidence for us that memory and execution scopes were intersected.

Regarding revising scratchpad. I'd also support that since it's natural for the memory structure to handle the memory scope of underlined pointer. I'll think how to organize it. I'm also thinking about having the memory scope as a template parameter for sorters. It can be used as following:

using my_joint_sorter = default_sorters::joint_sorter<memory_scope::device>; ... auto temp_memory_size = my_joint_sorter::memory_required(d, n_elements); ... // device level joint_sort(group, first, last, my_joint_sorter{temp}); ...

In this case joint_sort knows the memory scope of temp.
Any feedback is welcome.

using my_joint_sorter = default_sorters::joint_sorter<memory_scope::device>; ... auto temp_memory_size = my_joint_sorter::memory_required(d, n_elements); ... // device level joint_sort(group, first, last, my_joint_sorter{temp}); ...

In this case joint_sort knows the memory scope of temp. Any feedback is welcome.

This seems a little confusing to me. I'd read joint_sorter<memory_scope::device> as being a sorter that sorts at device-scope, not one that allocates at device-scope -- basically the same problem as before. This is going to get even more confusing when we introduce user-defined groups, because there might not even be a one-to-one mapping between groups and memory scopes.

I think to make this really explicit, you might need to do something like require developers to specify both the execution_scope/Group type and the memory_scope as separate arguments. Group type might be more future-proof:

// Define a sorter that sorts per work-group, using a single allocation at device memory scope using my_joint_sorter = default_sorters::joint_sorter<sycl::work_group, memory_scope::device>; // Get the amount of memory required for the single allocation at device memory scope auto temp_memory_size = my_joint_sorter::memory_required(d, n_elements); ... // "group" here must be a sycl::work_group because that's what the sorter expects joint_sort(group, first, last, my_joint_sorter{temp});

vs

// Define a sorter that sorts per work-group, using separate memory per work-group using my_joint_sorter = default_sorters::joint_sorter<sycl::work_group, memory_scope::work_group>; // Get the amount of memory required for each work-group auto temp_memory_size = my_joint_sorter::memory_required(d, n_elements); ... // "group" here must be a sycl::work_group because that's what the sorter expects joint_sort(group, first, last, my_joint_sorter{temp});

vs

// Define a sorter that sorts for the whole device, using a single allocation using my_joint_sorter = default_sorters::joint_sorter<sycl::root_group, memory_scope::device>; // Get the amount of memory required auto temp_memory_size = my_joint_sorter::memory_required(d, n_elements); ... // "group" here must be a sycl::root_group because that's what the sorter expects joint_sort(group, first, last, my_joint_sorter{temp});

make a possibility for memory_scope::device

ea85c96

andreyfe1 requested a review from a team as a code owner February 28, 2025 10:43

andreyfe1 commented Feb 28, 2025

View reviewed changes

change the time

23cb82c

Pennycook requested changes Mar 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][DOC][Group sort] Let memory_required methods accept memory_scope::device as a parameter #17238

[SYCL][DOC][Group sort] Let memory_required methods accept memory_scope::device as a parameter #17238

Uh oh!

andreyfe1 commented Feb 28, 2025

Uh oh!

andreyfe1 Feb 28, 2025

Uh oh!

andreyfe1 Feb 28, 2025

Uh oh!

andreyfe1 commented Mar 13, 2025

Uh oh!

Pennycook Mar 14, 2025

Uh oh!

andreyfe1 Mar 21, 2025

Uh oh!

Pennycook Mar 25, 2025

Uh oh!

Uh oh!

[SYCL][DOC][Group sort] Let memory_required methods accept memory_scope::device as a parameter #17238

Are you sure you want to change the base?

[SYCL][DOC][Group sort] Let memory_required methods accept memory_scope::device as a parameter #17238

Uh oh!

Conversation

andreyfe1 commented Feb 28, 2025

Uh oh!

andreyfe1 Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

andreyfe1 Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

andreyfe1 commented Mar 13, 2025

Uh oh!

Pennycook Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

andreyfe1 Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Pennycook Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!