[CK_TILE] Refactor `UniversalGemm::MakeA/B/C/DBlockViews` to allow caller to pass desciptors directly by assistant-librarian[bot] · Pull Request #4295 · ROCm/rocm-libraries

assistant-librarian · 2026-02-03T21:51:47Z

Proposed changes

Currently UniversalGemmKernel::MakeA/B/C/DBlockViews directly create tensor views from strides and sizes. This refactors the descriptor creation out and add overloaded definitions, allowing descriptors to be created separately by the caller instead of passing explicit strides, with no functional changes.

This will enable further refactoring of RunGemm to do likewise, enabling derived kernels like BatchedContractionKernel to avoid creating separate versions (PR #3457).

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

Since the logic within the MakeXBlockviews chains together operations on tuples, and thus the descriptors are also passed as such, adding a template parameter for the type of the input tuple was the simplest option to enable the overload without too much verbiage. However, for MakeCBlockView this adds a complications as the templated definitions are prone to overlap. This for now is avoided by just moving the arguments around for the descriptor version, which avoids the collision. It's not a great solution, so feel free to suggest a better one.

🔁 Imported from ROCm/composable_kernel#3467
🧑‍💻 Originally authored by @amd-meskelin

This adds utility functions to construct default tensor descriptors for A, B, C and D tensors and refactors the Make{A,B,C,D}BlockWindows to call make_tensor_view using the utility functions instead of directly calling make_naive_tensor_view, allowing for further refactors later.

This adds overloaded versions of the block window creation functions that allow the caller to specify explicit descriptors instead of the default ones, and reimplements the existing definitions by calling the new ones using default descriptors.

…evelop/ROCm_composable_kernel/pr-3467

…3467

[CK_TILE] Refactor `UniversalGemm::MakeA/B/C/DBlockViews` to allow caller to pass desciptors directly (#4295) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Currently `UniversalGemmKernel::MakeA/B/C/DBlockViews` directly create tensor views from strides and sizes. This refactors the descriptor creation out and add overloaded definitions, allowing descriptors to be created separately by the caller instead of passing explicit strides, with no functional changes. This will enable further refactoring of `RunGemm` to do likewise, enabling derived kernels like BatchedContractionKernel to avoid creating separate versions (PR [#3457](#3457)). ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [x] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [x] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion Since the logic within the MakeXBlockviews chains together operations on tuples, and thus the descriptors are also passed as such, adding a template parameter for the type of the input tuple was the simplest option to enable the overload without too much verbiage. However, for `MakeCBlockView` this adds a complications as the templated definitions are prone to overlap. This for now is avoided by just moving the arguments around for the descriptor version, which avoids the collision. It's not a great solution, so feel free to suggest a better one.

@amd-meskelin

…ller to pass desciptors directly (#4295) ## Proposed changes Currently `UniversalGemmKernel::MakeA/B/C/DBlockViews` directly create tensor views from strides and sizes. This refactors the descriptor creation out and add overloaded definitions, allowing descriptors to be created separately by the caller instead of passing explicit strides, with no functional changes. This will enable further refactoring of `RunGemm` to do likewise, enabling derived kernels like BatchedContractionKernel to avoid creating separate versions (PR [#3457](ROCm/composable_kernel#3457)). ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [x] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [x] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion Since the logic within the MakeXBlockviews chains together operations on tuples, and thus the descriptors are also passed as such, adding a template parameter for the type of the input tuple was the simplest option to enable the overload without too much verbiage. However, for `MakeCBlockView` this adds a complications as the templated definitions are prone to overlap. This for now is avoided by just moving the arguments around for the descriptor version, which avoids the collision. It's not a great solution, so feel free to suggest a better one. --- 🔁 Imported from [ROCm/composable_kernel#3467](ROCm/composable_kernel#3467) 🧑‍💻 Originally authored by @amd-meskelin --------- Co-authored-by: Matti Eskelinen <matti.eskelinen@amd.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

amd-meskelin and others added 6 commits January 15, 2026 14:25

Generate the descriptors explicitly as separate tuples

fc38355

Merge branch 'develop' into meskelin/refactor-makegemmtensorviews

1ecb8b0

Merge branch 'develop' into meskelin/refactor-makegemmtensorviews

95ca0c3

Merge commit '95ca0c373aa654d6403b0164c31e02c8b759e9ee' into import/d…

c071130

…evelop/ROCm_composable_kernel/pr-3467

assistant-librarian bot requested review from afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, geyyer, illsilin, poyenc, qianfengz and tenpercent as code owners February 3, 2026 21:51

assistant-librarian bot added the imported pr label Feb 3, 2026

assistant-librarian bot requested review from ThomasNing and coderfeli as code owners February 3, 2026 21:51

github-actions bot added the project: composablekernel label Feb 3, 2026

assistant-librarian bot added the external contribution Code contribution from users community.. label Feb 3, 2026

Merge branch 'develop' into import/develop/ROCm_composable_kernel/pr-…

56d026f

…3467

illsilin requested a review from a team as a code owner February 17, 2026 20:54

illsilin assigned ThomasNing Feb 17, 2026

Merge branch 'develop' into import/develop/ROCm_composable_kernel/pr-…

1bfcf91

…3467

ThomasNing approved these changes Feb 24, 2026

View reviewed changes

ThomasNing merged commit fa2cfc8 into develop Feb 24, 2026
18 checks passed

ThomasNing deleted the import/develop/ROCm_composable_kernel/pr-3467 branch February 24, 2026 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK_TILE] Refactor `UniversalGemm::MakeA/B/C/DBlockViews` to allow caller to pass desciptors directly#4295

[CK_TILE] Refactor `UniversalGemm::MakeA/B/C/DBlockViews` to allow caller to pass desciptors directly#4295
ThomasNing merged 8 commits intodevelopfrom
import/develop/ROCm_composable_kernel/pr-3467

assistant-librarian bot commented Feb 3, 2026 •

edited by amd-meskelin

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

assistant-librarian bot commented Feb 3, 2026 • edited by amd-meskelin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

assistant-librarian bot commented Feb 3, 2026 •

edited by amd-meskelin

Loading