[MLIR][GPU] Add a pattern to rewrite gpu.subgroup_id #137671

lialan · 2025-04-28T17:15:18Z

This patch impelemnts a rewrite pattern for transforming gpu.subgroup_id to:

subgroup_id = linearized_thread_id / gpu.subgroup_size

where:

linearized_thread_id = thread_id.x + block_dim.x * (thread_id.y + block_dim.y * thread_id.z)

github-actions · 2025-04-28T17:17:48Z

✅ With the latest revision this PR passed the C/C++ code formatter.

This patch impelemnts a rewrite pattern for transforming `gpu.subgroup_id` to: ``` subgroup_id = linearized_thread_id / gpu.subgroup_size ``` where: ``` linearized_thread_id = thread_id.x + block_dim.x * (thread_id.y + block_dim.y * thread_id.z) ```

llvmbot · 2025-04-28T18:15:17Z

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Alan Li (lialan)

Changes

This patch impelemnts a rewrite pattern for transforming gpu.subgroup_id to:

subgroup_id = linearized_thread_id / gpu.subgroup_size

where:

linearized_thread_id = thread_id.x + block_dim.x * (thread_id.y + block_dim.y * thread_id.z)

Full diff: https://github.com/llvm/llvm-project/pull/137671.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.h (+5)
(modified) mlir/lib/Dialect/GPU/CMakeLists.txt (+1)
(added) mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp (+82)
(added) mlir/test/Dialect/GPU/subgroupId-rewrite.mlir (+26)

diff --git a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.h b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
index a13ad33df29cd..cbb990e603a38 100644
--- a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
@@ -39,6 +39,10 @@ class FuncOp;
 /// Collect a set of patterns to rewrite GlobalIdOp op within the GPU dialect.
 void populateGpuGlobalIdPatterns(RewritePatternSet &patterns);
 
+/// Collect a set of patterns to rewrite SubgroupIdOp op within the GPU
+/// dialect.
+void populateGpuSubgroupIdPatterns(RewritePatternSet &patterns);
+
 /// Collect a set of patterns to rewrite shuffle ops within the GPU dialect.
 void populateGpuShufflePatterns(RewritePatternSet &patterns);
 
@@ -88,6 +92,7 @@ inline void populateGpuRewritePatterns(RewritePatternSet &patterns) {
   populateGpuAllReducePatterns(patterns);
   populateGpuGlobalIdPatterns(patterns);
   populateGpuShufflePatterns(patterns);
+  populateGpuSubgroupIdPatterns(patterns);
 }
 
 namespace gpu {
diff --git a/mlir/lib/Dialect/GPU/CMakeLists.txt b/mlir/lib/Dialect/GPU/CMakeLists.txt
index be6492a22f34f..e21fa501bae6b 100644
--- a/mlir/lib/Dialect/GPU/CMakeLists.txt
+++ b/mlir/lib/Dialect/GPU/CMakeLists.txt
@@ -40,6 +40,7 @@ add_mlir_dialect_library(MLIRGPUTransforms
   Transforms/ROCDLAttachTarget.cpp
   Transforms/ShuffleRewriter.cpp
   Transforms/SPIRVAttachTarget.cpp
+  Transforms/SubgroupIdRewriter.cpp
   Transforms/SubgroupReduceLowering.cpp
 
   OBJECT
diff --git a/mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp b/mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp
new file mode 100644
index 0000000000000..1c322c1016c01
--- /dev/null
+++ b/mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp
@@ -0,0 +1,82 @@
+//===- SubgroupIdRewriter.cpp - Implementation of SugroupId rewriting  ----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements in-dialect rewriting of the gpu.subgroup_id op for archs
+// where:
+// subgroup_id = (tid.x + dim.x * (tid.y + dim.y * tid.z)) / subgroup_size
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/GPU/IR/GPUDialect.h"
+#include "mlir/Dialect/GPU/Transforms/Passes.h"
+#include "mlir/Dialect/Index/IR/IndexOps.h"
+#include "mlir/IR/Builders.h"
+#include "mlir/IR/PatternMatch.h"
+#include "mlir/Pass/Pass.h"
+
+using namespace mlir;
+
+namespace {
+struct GpuSubgroupIdRewriter final : OpRewritePattern<gpu::SubgroupIdOp> {
+  using OpRewritePattern<gpu::SubgroupIdOp>::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(gpu::SubgroupIdOp op,
+                                PatternRewriter &rewriter) const override {
+    // Calculation of the thread's subgroup identifier.
+    //
+    // The process involves mapping the thread's 3D identifier within its
+    // block (b_id.x, b_id.y, b_id.z) to a 1D linear index.
+    // This linearization assumes a layout where the x-dimension (w_dim.x)
+    // varies most rapidly (i.e., it is the innermost dimension).
+    //
+    // The formula for the linearized thread index is:
+    // L = tid.x + dim.x * (tid.y + (dim.y * tid.z))
+    //
+    // Subsequently, the range of linearized indices [0, N_threads-1] is
+    // divided into consecutive, non-overlapping segments, each representing
+    // a subgroup of size 'subgroup_size'.
+    //
+    // Example Partitioning (N = subgroup_size):
+    // | Subgroup 0      | Subgroup 1      | Subgroup 2      | ... |
+    // | Indices 0..N-1  | Indices N..2N-1 | Indices 2N..3N-1| ... |
+    //
+    // The subgroup identifier is obtained via integer division of the
+    // linearized thread index by the predefined 'subgroup_size'.
+    //
+    // subgroup_id = floor( L / subgroup_size )
+    //             = (tid.x + dim.x * (tid.y + dim.y * tid.z)) /
+    //             subgroup_size
+
+    auto loc = op->getLoc();
+
+    Value dimX = rewriter.create<gpu::BlockDimOp>(loc, gpu::Dimension::x);
+    Value dimY = rewriter.create<gpu::BlockDimOp>(loc, gpu::Dimension::y);
+    Value tidX = rewriter.create<gpu::ThreadIdOp>(loc, gpu::Dimension::x);
+    Value tidY = rewriter.create<gpu::ThreadIdOp>(loc, gpu::Dimension::y);
+    Value tidZ = rewriter.create<gpu::ThreadIdOp>(loc, gpu::Dimension::z);
+
+    Value dimYxIdZ = rewriter.create<index::MulOp>(loc, dimY, tidZ);
+    Value dimYxIdZPlusIdY = rewriter.create<index::AddOp>(loc, dimYxIdZ, tidY);
+    Value dimYxIdZPlusIdYTimesDimX =
+        rewriter.create<index::MulOp>(loc, dimX, dimYxIdZPlusIdY);
+    Value IdXPlusDimYxIdZPlusIdYTimesDimX =
+        rewriter.create<index::AddOp>(loc, tidX, dimYxIdZPlusIdYTimesDimX);
+    Value subgroupSize = rewriter.create<gpu::SubgroupSizeOp>(
+        loc, rewriter.getIndexType(), /*upper_bound = */ nullptr);
+    Value subgroupIdOp = rewriter.create<index::DivUOp>(
+        loc, IdXPlusDimYxIdZPlusIdYTimesDimX, subgroupSize);
+    rewriter.replaceOp(op, {subgroupIdOp});
+    return success();
+  }
+};
+
+} // namespace
+
+void mlir::populateGpuSubgroupIdPatterns(RewritePatternSet &patterns) {
+  patterns.add<GpuSubgroupIdRewriter>(patterns.getContext());
+}
diff --git a/mlir/test/Dialect/GPU/subgroupId-rewrite.mlir b/mlir/test/Dialect/GPU/subgroupId-rewrite.mlir
new file mode 100644
index 0000000000000..02fcb2ba21dad
--- /dev/null
+++ b/mlir/test/Dialect/GPU/subgroupId-rewrite.mlir
@@ -0,0 +1,26 @@
+// RUN: mlir-opt --test-gpu-rewrite -split-input-file %s | FileCheck %s
+
+module {
+  // CHECK-LABEL: func.func @subgroupId
+  // CHECK-SAME: (%[[SZ:.*]]: index, %[[MEM:.*]]: memref<index, 1>) {
+  func.func @subgroupId(%sz : index, %mem: memref<index, 1>) {
+    gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %sz, %grid_y = %sz, %grid_z = %sz)
+               threads(%tx, %ty, %tz) in (%block_x = %sz, %block_y = %sz, %block_z = %sz) {
+      // CHECK: %[[DIMX:.*]] = gpu.block_dim  x
+      // CHECK-NEXT: %[[DIMY:.*]] = gpu.block_dim  y
+      // CHECK-NEXT: %[[TIDX:.*]] = gpu.thread_id  x
+      // CHECK-NEXT: %[[TIDY:.*]] = gpu.thread_id  y
+      // CHECK-NEXT: %[[TIDZ:.*]] = gpu.thread_id  z
+      // CHECK-NEXT: %[[T0:.*]] = index.mul %[[DIMY]], %[[TIDZ]]
+      // CHECK-NEXT: %[[T1:.*]] = index.add %[[T0]], %[[TIDY]]
+      // CHECK-NEXT: %[[T2:.*]] = index.mul %[[DIMX]], %[[T1]]
+      // CHECK-NEXT: %[[T3:.*]] = index.add %[[TIDX]], %[[T2]]
+      // CHECK-NEXT: %[[T4:.*]] = gpu.subgroup_size : index
+      // CHECK-NEXT: %[[T5:.*]] = index.divu %[[T3]], %[[T4]]
+      %idz = gpu.subgroup_id : index
+      memref.store %idz, %mem[] : memref<index, 1>
+      gpu.terminator
+    }
+    return
+  }
+}

lialan · 2025-04-28T18:17:09Z

@krzysz00 suggests we can move decomposing gpu.subgroup_id to within gpu dialect.

Copilot

Pull Request Overview

This PR implements a new rewrite pattern for the GPU dialect to transform gpu.subgroup_id using a linearized thread ID calculation.

Introduces the GpuSubgroupIdRewriter rewrite pattern in SubgroupIdRewriter.cpp
Updates the passes header to register the new rewrite pattern

Reviewed Changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 1 comment.

File	Description
mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp	Implements the rewrite pattern for gpu.subgroup_id.
mlir/include/mlir/Dialect/GPU/Transforms/Passes.h	Registers the new SubgroupId rewrite pattern.

Files not reviewed (2)

mlir/lib/Dialect/GPU/CMakeLists.txt: Language not supported
mlir/test/Dialect/GPU/subgroupId-rewrite.mlir: Language not supported

mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

krzysz00

Minor notes, ideas seem fine

krzysz00 · 2025-04-28T20:29:17Z

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h

@@ -88,6 +92,7 @@ inline void populateGpuRewritePatterns(RewritePatternSet &patterns) {
  populateGpuAllReducePatterns(patterns);
  populateGpuGlobalIdPatterns(patterns);
  populateGpuShufflePatterns(patterns);
+  populateGpuSubgroupIdPatterns(patterns);


Let's make sure that this doesn't end up in the SPIR-V lowerings, which seem to have an alternate approach to this op

I see. SPV links get_sub_group_id function to get the subgroup id. So I think we should just remove this line inside populateGpuRewritePatterns.

side note: SPIRV uses the same calculation method to compute subgroup_id.

I eventually stripped it from this function.

krzysz00 · 2025-04-28T20:30:20Z

mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp

+    Value dimYxIdZ = rewriter.create<index::MulOp>(loc, dimY, tidZ);
+    Value dimYxIdZPlusIdY = rewriter.create<index::AddOp>(loc, dimYxIdZ, tidY);
+    Value dimYxIdZPlusIdYTimesDimX =
+        rewriter.create<index::MulOp>(loc, dimX, dimYxIdZPlusIdY);


I think arith:: over index:: are fine here - none of these values are at the point where the stuff that caused the index dialect to come into existence is a problem

... Hey, why'd this get landed with index::?

mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp

kuhar · 2025-04-29T01:45:28Z

mlir/test/Dialect/GPU/subgroupId-rewrite.mlir

@@ -0,0 +1,26 @@
+// RUN: mlir-opt --test-gpu-rewrite -split-input-file %s | FileCheck %s
+
+module {


This contains both an explicit module and and implicit one -- I don't think we need both. Can we drop either one?

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

kuhar

LGTM % use of the index dialect

This patch impelemnts a rewrite pattern for transforming `gpu.subgroup_id` to: ``` subgroup_id = linearized_thread_id / gpu.subgroup_size ``` where: ``` linearized_thread_id = thread_id.x + block_dim.x * (thread_id.y + block_dim.y * thread_id.z) ```

lialan force-pushed the lialan/rewrite_subgroup_id branch from 2cc88a6 to 8c603f0 Compare April 28, 2025 17:22

lialan force-pushed the lialan/rewrite_subgroup_id branch from 8c603f0 to a19415f Compare April 28, 2025 18:14

lialan marked this pull request as ready for review April 28, 2025 18:14

llvmbot added mlir:gpu mlir labels Apr 28, 2025

lialan mentioned this pull request Apr 28, 2025

[MLIR][ROCDL] Add conversion for gpu.subgroup_id to ROCDL #136405

Closed

lialan requested review from krzysz00, kuhar and Copilot April 28, 2025 18:15

Copilot AI reviewed Apr 28, 2025

View reviewed changes

mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp Outdated Show resolved Hide resolved

Update mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp

fbe3bd1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

krzysz00 reviewed Apr 28, 2025

View reviewed changes

lialan mentioned this pull request Apr 28, 2025

Lower linalg.copy to direct global load iree-org/iree#20568

Open

lialan force-pushed the lialan/rewrite_subgroup_id branch from e3e30d1 to bbde763 Compare April 29, 2025 01:38

kuhar reviewed Apr 29, 2025

View reviewed changes

remove

e316a6e

lialan force-pushed the lialan/rewrite_subgroup_id branch from bbde763 to e316a6e Compare April 29, 2025 03:04

Update mlir/lib/Dialect/GPU/Transforms/SubgroupIdRewriter.cpp

74912ac

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

lialan requested review from kuhar and krzysz00 April 29, 2025 13:26

kuhar approved these changes Apr 29, 2025

View reviewed changes

lialan merged commit ac65b2c into llvm:main Apr 29, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR][GPU] Add a pattern to rewrite gpu.subgroup_id #137671

[MLIR][GPU] Add a pattern to rewrite gpu.subgroup_id #137671

Uh oh!

lialan commented Apr 28, 2025

Uh oh!

github-actions bot commented Apr 28, 2025 •

edited

Loading

Uh oh!

llvmbot commented Apr 28, 2025 •

edited

Loading

Uh oh!

lialan commented Apr 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

krzysz00 left a comment

Uh oh!

krzysz00 Apr 28, 2025

Uh oh!

lialan Apr 28, 2025

Uh oh!

lialan Apr 29, 2025

Uh oh!

krzysz00 Apr 28, 2025

Uh oh!

krzysz00 Apr 29, 2025

Uh oh!

Uh oh!

kuhar Apr 29, 2025

Uh oh!

lialan Apr 29, 2025

Uh oh!

kuhar left a comment

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,26 @@
		// RUN: mlir-opt --test-gpu-rewrite -split-input-file %s \| FileCheck %s

		module {

[MLIR][GPU] Add a pattern to rewrite gpu.subgroup_id #137671

[MLIR][GPU] Add a pattern to rewrite gpu.subgroup_id #137671

Uh oh!

Conversation

lialan commented Apr 28, 2025

Uh oh!

github-actions bot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lialan commented Apr 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

lialan Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

lialan Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kuhar Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

lialan Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 28, 2025 •

edited

Loading

llvmbot commented Apr 28, 2025 •

edited

Loading