Add threadblock map transformation #2116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

aydogdub wants to merge 19 commits into spcl:main from aydogdub:add-threadblock-map-transformation

Contributor

aydogdub commented Jul 30, 2025

Description

I implemented a transformation which adds an explicit GPU_ThreadBlock-scheduled map to a GPU_Device-scheduled map if
it does not already have one (in certain cases). Note that thread blocks are always present and used in CUDA kernels. Making
them explicit in the SDFG is good practice and promotes modularity.

I implemented this transformation as part of my Master’s thesis work, aiming to closely replicate the previous behavior to ensure
backwards compatibility. Since only minimal changes were needed to make it useful in combination with CUDACodeGen as well,
I am submitting it as a pull request.

Example

Below is an example illustrating the effect of the transformation on a simple SDFG. The transformation adds an explicit GPU_ThreadBlock-scheduled map with default block sizes obtained from the configuration.

grafik

grafik

aydogdub added 17 commits

July 30, 2025 18:59


          Add missing, explicit ThreadBlock maps via a Pass

e6507f9


          fix bug

8b67bdc


          run pre-commit

1d6707c


          For debugging purposes: Do not use the AddThreadBlockMap transformation

7bb53f5


          Forgot pre-commit

b858a0e


          activate add TB pass again

4b35be4


          Merge remote-tracking branch 'upstream/main' into add-threadblock-map…

23fae91

…-transformation


          pre-commit

7bcd6cf


          Ensure allocation of arrays occur AFTER depending loop iterator defin…

212a7ae

…itions


          Missing key existence check

9d07482


          Attempt to allocate after variable definition and before conidition w…

70f2650

…riting


          run pre-commit

f9d7acb


          remove sdfg.save

a9f8eb0


          pre-commit

a751831


          Merge branch 'main' of https://github.com/spcl/dace into add-threadbl…

5521b58

…ock-map-transformation


          adjust, maptiling seems to be fixed

fe5bd99


          mistake: block size is correct now

1e16b5b

ThrudPrimrose marked this pull request as ready for review

September 18, 2025 08:57

ThrudPrimrose requested review from alexnick83 and phschaad

September 18, 2025 08:57

alexnick83 requested changes

View reviewed changes

Contributor

alexnick83 left a comment

Looks quite good overall. I have minor comments/questions.

dace/codegen/targets/cuda.py Show resolved Hide resolved

dace/transformation/dataflow/add_threadblock_map.py Outdated

    
            @@ -0,0 +1,285 @@
          
              # Copyright 2019-2023 ETH Zurich and the DaCe authors. All rights reserved.

Contributor

alexnick83 Sep 18, 2025

Update year

Contributor Author

aydogdub Sep 18, 2025

Done

dace/transformation/dataflow/add_threadblock_map.py Outdated

    
            @@ -0,0 +1,285 @@
          
              # Copyright 2019-2023 ETH Zurich and the DaCe authors. All rights reserved.

              """ This module contains classes and functions that implement the grid-strided map tiling

Contributor

alexnick83 Sep 18, 2025

Update docstring

Contributor Author

aydogdub Sep 18, 2025

Done

dace/transformation/dataflow/add_threadblock_map.py Outdated

    
                      map_entry = self.map_entry

                      # Find the state that contains the map entry

                      state = next(state for node, state in sdfg.all_nodes_recursive() if node == map_entry)

Contributor

alexnick83 Sep 18, 2025

Seems a bit wasteful. Would it make sense to follow the {graph, graph.parent_graph, ...} chain until you find a state?

Contributor Author

aydogdub Sep 18, 2025

Yes this is wasteful. It can be even solved simpler. I overlooked that the input graph is actually the state containing map_entry . I have corrected this and updated the function signature with type annotations (and renamed graph to state). Thank you!

dace/transformation/dataflow/add_threadblock_map.py

    
                              return False  # Already has GPU-scheduled inner scope — does not apply

                      # Check if the map is nested inside another GPU-scheduled map

                      parent_map_tuple = helpers.get_parent_map(state, map_entry)

Contributor

alexnick83 Sep 18, 2025

I do not think that we support GPU-device map inside any other GPU-scheduled map, but I amy not be up-to-date with developments. However, in such a case, you can omit the following check.

Contributor Author

aydogdub Sep 18, 2025 •

edited

Loading

Actually, nested GPU_Device maps are supported and are (rarely) used (See tests/codegen/warp_specialization_test.py)

dace/transformation/dataflow/add_threadblock_map.py Outdated Show resolved Hide resolved

dace/transformation/dataflow/add_threadblock_map.py Outdated

    
                      new_kernel_entry.map.gpu_block_size = gpu_block_size

                      # Catch any unexpected mismatches of inserted threadblock map's block size and the used block size

                      tb_size = to_3d_dims([symbolic.overapproximate(sz) for sz in thread_block_map_entry.map.range.size()[::-1]])

Contributor

alexnick83 Sep 18, 2025

Is this a sanity/debug check, or can this actually fail? In the latter case, would it make sense to move this check to the can_be_applied method and just not apply the transformation?

Contributor Author

aydogdub Sep 18, 2025

This was only a sanity check for me, it did never fail. I have removed it.


          adjust based on comments

3a98144

ThrudPrimrose requested a review from alexnick83

September 22, 2025 09:46

alexnick83 approved these changes

View reviewed changes

phschaad added this pull request to the merge queue

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks

phschaad added this pull request to the merge queue

github-merge-queue bot removed this pull request from the merge queue due to failed status checks


          Merge branch 'main' into add-threadblock-map-transformation

983e6d6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet