Skip to content

[MLIR][ROCDL] Lower gpu.subgroup_size to wavefrontsize #137360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 25, 2025

Conversation

lialan
Copy link
Member

@lialan lialan commented Apr 25, 2025

No description provided.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR lowers gpu.subgroup_id by converting it to a wavefront size via a new lowering pattern. Key changes include:

  • Adding a helper function to adjust bitwidths (truncOrExtToLLVMType).
  • Introducing the GPUSubgroupSizeOpToROCDL conversion pattern.
  • Updating the conversion pattern registration to include the new subgroup size lowering.
Files not reviewed (3)
  • mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td: Language not supported
  • mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir: Language not supported
  • mlir/test/Target/LLVMIR/rocdl.mlir: Language not supported
Comments suppressed due to low confidence (1)

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp:55

  • [nitpick] Consider renaming 'truncOrExtToLLVMType' to a more descriptive name such as 'adjustIntBitwidthToIndexType' to more clearly reflect its purpose.
// Truncate or extend the result depending on the index bitwidth specified by the LLVMTypeConverter options.

@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2025

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Alan Li (lialan)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/137360.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td (+2)
  • (modified) mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp (+36-1)
  • (modified) mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir (+7-3)
  • (modified) mlir/test/Target/LLVMIR/rocdl.mlir (+4)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 186a4f53f93cb..93e59e0e7e6be 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -216,6 +216,8 @@ def ROCDL_BlockIdXOp : ROCDL_SpecialIdRegisterOp<"workgroup.id.x">;
 def ROCDL_BlockIdYOp : ROCDL_SpecialIdRegisterOp<"workgroup.id.y">;
 def ROCDL_BlockIdZOp : ROCDL_SpecialIdRegisterOp<"workgroup.id.z">;
 
+def ROCDL_WavefrontSizeOp : ROCDL_SpecialIdRegisterOp<"wavefrontsize">;
+
 //===----------------------------------------------------------------------===//
 // Thread range and Block range
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index e6dd6f135884e..d17fb4716d331 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -52,6 +52,25 @@ namespace mlir {
 
 using namespace mlir;
 
+// Truncate or extend the result depending on the index bitwidth specified
+// by the LLVMTypeConverter options.
+static Value truncOrExtToLLVMType(ConversionPatternRewriter &rewriter,
+                                  Location loc, Value value,
+                                  const LLVMTypeConverter &converter) {
+  int64_t intWidth = cast<IntegerType>(value.getType()).getWidth();
+  int64_t indexBitwidth = converter.getIndexTypeBitwidth();
+  auto indexBitwidthType =
+      IntegerType::get(rewriter.getContext(), converter.getIndexTypeBitwidth());
+  // TODO: use <=> in C++20.
+  if (indexBitwidth > intWidth) {
+    return rewriter.create<LLVM::SExtOp>(loc, indexBitwidthType, value);
+  }
+  if (indexBitwidth < intWidth) {
+    return rewriter.create<LLVM::TruncOp>(loc, indexBitwidthType, value);
+  }
+  return value;
+}
+
 /// Returns true if the given `gpu.func` can be safely called using the bare
 /// pointer calling convention.
 static bool canBeCalledWithBarePointers(gpu::GPUFuncOp func) {
@@ -113,6 +132,20 @@ struct GPULaneIdOpToROCDL : ConvertOpToLLVMPattern<gpu::LaneIdOp> {
   }
 };
 
+struct GPUSubgroupSizeOpToROCDL : ConvertOpToLLVMPattern<gpu::SubgroupSizeOp> {
+  using ConvertOpToLLVMPattern::ConvertOpToLLVMPattern;
+  LogicalResult
+  matchAndRewrite(gpu::SubgroupSizeOp op, gpu::SubgroupSizeOp::Adaptor adaptor,
+                  ConversionPatternRewriter &rewriter) const override {
+    Value wavefrontOp = rewriter.create<ROCDL::WavefrontSizeOp>(
+        op.getLoc(), IntegerType::get(rewriter.getContext(), 32));
+    wavefrontOp = truncOrExtToLLVMType(rewriter, op.getLoc(), wavefrontOp,
+                                       *getTypeConverter());
+    rewriter.replaceOp(op, {wavefrontOp});
+    return success();
+  }
+};
+
 struct GPUShuffleOpLowering : public ConvertOpToLLVMPattern<gpu::ShuffleOp> {
   using ConvertOpToLLVMPattern<gpu::ShuffleOp>::ConvertOpToLLVMPattern;
 
@@ -405,7 +438,9 @@ void mlir::populateGpuToROCDLConversionPatterns(
   // TODO: Add alignment for workgroup memory
   patterns.add<GPUDynamicSharedMemoryOpLowering>(converter);
 
-  patterns.add<GPUShuffleOpLowering, GPULaneIdOpToROCDL>(converter);
+  patterns
+      .add<GPUShuffleOpLowering, GPULaneIdOpToROCDL, GPUSubgroupSizeOpToROCDL>(
+          converter);
 
   populateMathToROCDLConversionPatterns(converter, patterns);
 }
diff --git a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
index 071cae9d5789f..5e3cad0cf26b0 100644
--- a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
@@ -11,7 +11,7 @@ gpu.module @test_module {
   func.func @gpu_index_ops()
       -> (index, index, index, index, index, index,
           index, index, index, index, index, index,
-          index) {
+          index, index) {
     // CHECK32-NOT: = llvm.sext %{{.*}} : i32 to i64
 
     // CHECK: rocdl.workitem.id.x : i32
@@ -59,12 +59,16 @@ gpu.module @test_module {
     // CHECK: = llvm.sext %{{.*}} : i32 to i64
     %laneId = gpu.lane_id
 
+    // CHECK: = rocdl.wavefrontsize : i32
+    // CHECK: = llvm.sext %{{.*}} : i32 to i64
+    %subgroupSize = gpu.subgroup_size : index
+
     func.return %tIdX, %tIdY, %tIdZ, %bDimX, %bDimY, %bDimZ,
                %bIdX, %bIdY, %bIdZ, %gDimX, %gDimY, %gDimZ,
-               %laneId
+               %laneId, %subgroupSize
         : index, index, index, index, index, index,
           index, index, index, index, index, index,
-          index
+          index, index
   }
 }
 
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir
index 3db1f7b2b6427..d12316101382b 100644
--- a/mlir/test/Target/LLVMIR/rocdl.mlir
+++ b/mlir/test/Target/LLVMIR/rocdl.mlir
@@ -32,6 +32,10 @@ llvm.func @rocdl_special_regs() -> i32 {
 
   // CHECK: call range(i64 1, 65) i64 @__ockl_get_local_size(i32 0)
   %14 = rocdl.workgroup.dim.x range <i32, 1, 65> : i64
+
+  // CHECK: call i64 $llvm.amdgcn.wavefrontsize()
+  %15 = rocdl.wavefrontsize : i32
+
   llvm.return %1 : i32
 }
 

@lialan lialan force-pushed the lialan/wavfrontsize branch from 71b4ca0 to de4364f Compare April 25, 2025 17:51
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... forgot to submit my review lol

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, thanks for adding this

@@ -52,6 +52,25 @@ namespace mlir {

using namespace mlir;

// Truncate or extend the result depending on the index bitwidth specified
// by the LLVMTypeConverter options.
static Value truncOrExtToLLVMType(ConversionPatternRewriter &rewriter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also use this in the code below, e.g., GPULaneIdOpToROCDL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lialan lialan changed the title [MLIR][ROCDL] Lower gpu.subgroup_id to wavefrontsize [MLIR][ROCDL] Lower gpu.subgroup_size to wavefrontsize Apr 25, 2025
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note about how the range is being set (and a suggestion), otherwise LGTM

if (auto upperBoundAttr = op.getUpperBoundAttr()) {
bounds = rewriter.getAttr<LLVM::ConstantRangeAttr>(
/*bitWidth=*/32, /*lower=*/32,
/*upper=*/op.getUpperBoundAttr().getInt());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 1 here - LLVM uses a <= x < b ranges and MLIR size ops use a <= upper bound

LLVM::ConstantRangeAttr bounds = nullptr;
if (auto upperBoundAttr = op.getUpperBoundAttr()) {
bounds = rewriter.getAttr<LLVM::ConstantRangeAttr>(
/*bitWidth=*/32, /*lower=*/32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the lower bound be 64 before gfx10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is right!

@lialan lialan merged commit 6e47937 into llvm:main Apr 25, 2025
11 checks passed
@lialan lialan deleted the lialan/wavfrontsize branch April 25, 2025 23:21
raikonenfnu added a commit that referenced this pull request Apr 26, 2025
…7439)

This PR addressed a bug from #137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
raikonenfnu added a commit to iree-org/iree that referenced this pull request Apr 26, 2025
Updated LIT test from landing llvm/llvm-project#136640 which folds
linalg.index when size is unit dim (1).

Added chipSet argument into populateGpuToROCDLConversionPatterns based
on changes in llvm/llvm-project#137360

This patch carries revert of

llvm/llvm-project#133231. This PR breaks fp_to_subbytes and
emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue
in #20645.

llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their
usage of GreedyRewriteConfig to use fluent API. i.e
enableRegionSimplification = VS setRegionSimplificationLevel

llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and
APInt not being updated. StableHLO needs to be updated with that PR's
change for us to be able to integrate.

llvm/llvm-project#121389. Torch-MLIR needs to be
updated with that PR's change for us to be able to integrate.

---------

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
…m#137439)

This PR addressed a bug from llvm#137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 6, 2025
…oROCDL (#137439)

This PR addressed a bug from llvm/llvm-project#137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#137439)

This PR addressed a bug from llvm#137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#137439)

This PR addressed a bug from llvm#137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#137439)

This PR addressed a bug from llvm#137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
KyleHerndon pushed a commit to KyleHerndon/iree that referenced this pull request May 7, 2025
Updated LIT test from landing llvm/llvm-project#136640 which folds
linalg.index when size is unit dim (1).

Added chipSet argument into populateGpuToROCDLConversionPatterns based
on changes in llvm/llvm-project#137360

This patch carries revert of

llvm/llvm-project#133231. This PR breaks fp_to_subbytes and
emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue
in iree-org#20645.

llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their
usage of GreedyRewriteConfig to use fluent API. i.e
enableRegionSimplification = VS setRegionSimplificationLevel

llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and
APInt not being updated. StableHLO needs to be updated with that PR's
change for us to be able to integrate.

llvm/llvm-project#121389. Torch-MLIR needs to be
updated with that PR's change for us to be able to integrate.

---------

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…m#137439)

This PR addressed a bug from llvm#137360. which was using
GPUSubgroupSizeToROCDL to patterns function that do not have a valid
constructor for it. This is causing compilation error below:

error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base
class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly
deleted

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants