-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[MLIR][ROCDL] Lower gpu.subgroup_size
to wavefrontsize
#137360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR lowers gpu.subgroup_id by converting it to a wavefront size via a new lowering pattern. Key changes include:
- Adding a helper function to adjust bitwidths (truncOrExtToLLVMType).
- Introducing the GPUSubgroupSizeOpToROCDL conversion pattern.
- Updating the conversion pattern registration to include the new subgroup size lowering.
Files not reviewed (3)
- mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td: Language not supported
- mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir: Language not supported
- mlir/test/Target/LLVMIR/rocdl.mlir: Language not supported
Comments suppressed due to low confidence (1)
mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp:55
- [nitpick] Consider renaming 'truncOrExtToLLVMType' to a more descriptive name such as 'adjustIntBitwidthToIndexType' to more clearly reflect its purpose.
// Truncate or extend the result depending on the index bitwidth specified by the LLVMTypeConverter options.
@llvm/pr-subscribers-mlir-gpu @llvm/pr-subscribers-mlir Author: Alan Li (lialan) ChangesFull diff: https://github.com/llvm/llvm-project/pull/137360.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 186a4f53f93cb..93e59e0e7e6be 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -216,6 +216,8 @@ def ROCDL_BlockIdXOp : ROCDL_SpecialIdRegisterOp<"workgroup.id.x">;
def ROCDL_BlockIdYOp : ROCDL_SpecialIdRegisterOp<"workgroup.id.y">;
def ROCDL_BlockIdZOp : ROCDL_SpecialIdRegisterOp<"workgroup.id.z">;
+def ROCDL_WavefrontSizeOp : ROCDL_SpecialIdRegisterOp<"wavefrontsize">;
+
//===----------------------------------------------------------------------===//
// Thread range and Block range
//===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index e6dd6f135884e..d17fb4716d331 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -52,6 +52,25 @@ namespace mlir {
using namespace mlir;
+// Truncate or extend the result depending on the index bitwidth specified
+// by the LLVMTypeConverter options.
+static Value truncOrExtToLLVMType(ConversionPatternRewriter &rewriter,
+ Location loc, Value value,
+ const LLVMTypeConverter &converter) {
+ int64_t intWidth = cast<IntegerType>(value.getType()).getWidth();
+ int64_t indexBitwidth = converter.getIndexTypeBitwidth();
+ auto indexBitwidthType =
+ IntegerType::get(rewriter.getContext(), converter.getIndexTypeBitwidth());
+ // TODO: use <=> in C++20.
+ if (indexBitwidth > intWidth) {
+ return rewriter.create<LLVM::SExtOp>(loc, indexBitwidthType, value);
+ }
+ if (indexBitwidth < intWidth) {
+ return rewriter.create<LLVM::TruncOp>(loc, indexBitwidthType, value);
+ }
+ return value;
+}
+
/// Returns true if the given `gpu.func` can be safely called using the bare
/// pointer calling convention.
static bool canBeCalledWithBarePointers(gpu::GPUFuncOp func) {
@@ -113,6 +132,20 @@ struct GPULaneIdOpToROCDL : ConvertOpToLLVMPattern<gpu::LaneIdOp> {
}
};
+struct GPUSubgroupSizeOpToROCDL : ConvertOpToLLVMPattern<gpu::SubgroupSizeOp> {
+ using ConvertOpToLLVMPattern::ConvertOpToLLVMPattern;
+ LogicalResult
+ matchAndRewrite(gpu::SubgroupSizeOp op, gpu::SubgroupSizeOp::Adaptor adaptor,
+ ConversionPatternRewriter &rewriter) const override {
+ Value wavefrontOp = rewriter.create<ROCDL::WavefrontSizeOp>(
+ op.getLoc(), IntegerType::get(rewriter.getContext(), 32));
+ wavefrontOp = truncOrExtToLLVMType(rewriter, op.getLoc(), wavefrontOp,
+ *getTypeConverter());
+ rewriter.replaceOp(op, {wavefrontOp});
+ return success();
+ }
+};
+
struct GPUShuffleOpLowering : public ConvertOpToLLVMPattern<gpu::ShuffleOp> {
using ConvertOpToLLVMPattern<gpu::ShuffleOp>::ConvertOpToLLVMPattern;
@@ -405,7 +438,9 @@ void mlir::populateGpuToROCDLConversionPatterns(
// TODO: Add alignment for workgroup memory
patterns.add<GPUDynamicSharedMemoryOpLowering>(converter);
- patterns.add<GPUShuffleOpLowering, GPULaneIdOpToROCDL>(converter);
+ patterns
+ .add<GPUShuffleOpLowering, GPULaneIdOpToROCDL, GPUSubgroupSizeOpToROCDL>(
+ converter);
populateMathToROCDLConversionPatterns(converter, patterns);
}
diff --git a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
index 071cae9d5789f..5e3cad0cf26b0 100644
--- a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
@@ -11,7 +11,7 @@ gpu.module @test_module {
func.func @gpu_index_ops()
-> (index, index, index, index, index, index,
index, index, index, index, index, index,
- index) {
+ index, index) {
// CHECK32-NOT: = llvm.sext %{{.*}} : i32 to i64
// CHECK: rocdl.workitem.id.x : i32
@@ -59,12 +59,16 @@ gpu.module @test_module {
// CHECK: = llvm.sext %{{.*}} : i32 to i64
%laneId = gpu.lane_id
+ // CHECK: = rocdl.wavefrontsize : i32
+ // CHECK: = llvm.sext %{{.*}} : i32 to i64
+ %subgroupSize = gpu.subgroup_size : index
+
func.return %tIdX, %tIdY, %tIdZ, %bDimX, %bDimY, %bDimZ,
%bIdX, %bIdY, %bIdZ, %gDimX, %gDimY, %gDimZ,
- %laneId
+ %laneId, %subgroupSize
: index, index, index, index, index, index,
index, index, index, index, index, index,
- index
+ index, index
}
}
diff --git a/mlir/test/Target/LLVMIR/rocdl.mlir b/mlir/test/Target/LLVMIR/rocdl.mlir
index 3db1f7b2b6427..d12316101382b 100644
--- a/mlir/test/Target/LLVMIR/rocdl.mlir
+++ b/mlir/test/Target/LLVMIR/rocdl.mlir
@@ -32,6 +32,10 @@ llvm.func @rocdl_special_regs() -> i32 {
// CHECK: call range(i64 1, 65) i64 @__ockl_get_local_size(i32 0)
%14 = rocdl.workgroup.dim.x range <i32, 1, 65> : i64
+
+ // CHECK: call i64 $llvm.amdgcn.wavefrontsize()
+ %15 = rocdl.wavefrontsize : i32
+
llvm.return %1 : i32
}
|
71b4ca0
to
de4364f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... forgot to submit my review lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, thanks for adding this
@@ -52,6 +52,25 @@ namespace mlir { | |||
|
|||
using namespace mlir; | |||
|
|||
// Truncate or extend the result depending on the index bitwidth specified | |||
// by the LLVMTypeConverter options. | |||
static Value truncOrExtToLLVMType(ConversionPatternRewriter &rewriter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use this in the code below, e.g., GPULaneIdOpToROCDL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is refactored in the other PR: https://github.com/llvm/llvm-project/pull/136405/files#diff-cd4257dddc1cb3043071e5c7641774615ffd685cc779acf70a47a3e83401b515R141
gpu.subgroup_id
to wavefrontsize
gpu.subgroup_size
to wavefrontsize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note about how the range is being set (and a suggestion), otherwise LGTM
if (auto upperBoundAttr = op.getUpperBoundAttr()) { | ||
bounds = rewriter.getAttr<LLVM::ConstantRangeAttr>( | ||
/*bitWidth=*/32, /*lower=*/32, | ||
/*upper=*/op.getUpperBoundAttr().getInt()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add 1 here - LLVM uses a <= x < b ranges and MLIR size ops use a <= upper bound
LLVM::ConstantRangeAttr bounds = nullptr; | ||
if (auto upperBoundAttr = op.getUpperBoundAttr()) { | ||
bounds = rewriter.getAttr<LLVM::ConstantRangeAttr>( | ||
/*bitWidth=*/32, /*lower=*/32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the lower bound be 64 before gfx10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is right!
…7439) This PR addressed a bug from #137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Updated LIT test from landing llvm/llvm-project#136640 which folds linalg.index when size is unit dim (1). Added chipSet argument into populateGpuToROCDLConversionPatterns based on changes in llvm/llvm-project#137360 This patch carries revert of llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in #20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…m#137439) This PR addressed a bug from llvm#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…oROCDL (#137439) This PR addressed a bug from llvm/llvm-project#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…m#137439) This PR addressed a bug from llvm#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…m#137439) This PR addressed a bug from llvm#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…m#137439) This PR addressed a bug from llvm#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Updated LIT test from landing llvm/llvm-project#136640 which folds linalg.index when size is unit dim (1). Added chipSet argument into populateGpuToROCDLConversionPatterns based on changes in llvm/llvm-project#137360 This patch carries revert of llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in iree-org#20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…m#137439) This PR addressed a bug from llvm#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>
No description provided.