[mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down #139105

Hsiangkai · 2025-05-08T16:12:44Z

From the description of gpu.shuffle operation, shuffle up/down rotates values in the subgroup because it applies modulo on the shifted value to calculate the result lane ID. It is inconsistent with the definition of SPIR-V shuffle up/down and NVVM data movement definitions within subgroup.

In NVVM, it says

"If the computed source lane index j is in range, the returned i32 value will be the value of %a from lane j; otherwise, it will be the the value of %a from the current thread."

It will keep the original value if the result land ID is out of range.

In SPIR-V OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown, it says

"The resulting value is undefined if Delta is greater than the current invocation’s id within the scope or if the identified invocation is not in scope restricted tangle."

It's an undefined value if the result land ID is out of range.

Anyway, there is no circular movement in shuffle up/down from these 2 specifications. This patch removes the circular movement in gpu.shuffle up/down and lower gpu.shuffle up/down to SPIR-V OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown directly.

Reference:
https://docs.nvidia.com/cuda/archive/12.2.1/nvvm-ir-spec/index.html#data-movement
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleUp
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleDown

llvmbot · 2025-05-08T16:13:16Z

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir-spirv

Author: Hsiangkai Wang (Hsiangkai)

Changes

Convert

gpu.shuffle down %val, %offset, %width

to

spirv.GroupNonUniformRotateKHR <Subgroup> %val, %offset, cluster_size(%width)

Convert

gpu.shuffle up %val, %offset, %width

to

%down_offset = arith.subi %width, %offset
spirv.GroupNonUniformRotateKHR <Subgroup> %val, %down_offset, cluster_size(%width)

In addition, update the spirv.GroupNonUniformRotateKHR assembly format to be consistent with other gpu non-uniform operations.

Full diff: https://github.com/llvm/llvm-project/pull/139105.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td (+3-3)
(modified) mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp (+10-2)
(modified) mlir/test/Conversion/GPUToSPIRV/shuffle.mlir (+57)
(modified) mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir (+9-9)

diff --git a/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td b/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
index 2dd3dbd28d436..3fdaff2470cba 100644
--- a/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
+++ b/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
@@ -1404,8 +1404,8 @@ def SPIRV_GroupNonUniformRotateKHROp : SPIRV_Op<"GroupNonUniformRotateKHR", [
 
     ```mlir
     %four = spirv.Constant 4 : i32
-    %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %value, %delta : f32, i32 -> f32
-    %1 = spirv.GroupNonUniformRotateKHR <Workgroup>, %value, %delta,
+    %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %value, %delta : f32, i32 -> f32
+    %1 = spirv.GroupNonUniformRotateKHR <Workgroup> %value, %delta,
          clustersize(%four) : f32, i32, i32 -> f32
     ```
   }];
@@ -1429,7 +1429,7 @@ def SPIRV_GroupNonUniformRotateKHROp : SPIRV_Op<"GroupNonUniformRotateKHR", [
   );
 
   let assemblyFormat = [{
-    $execution_scope `,` $value `,` $delta (`,` `cluster_size` `(` $cluster_size^ `)`)? attr-dict `:` type($value) `,` type($delta) (`,` type($cluster_size)^)? `->` type(results)
+    $execution_scope $value `,` $delta (`,` `cluster_size` `(` $cluster_size^ `)`)? attr-dict `:` type($value) `,` type($delta) (`,` type($cluster_size)^)? `->` type(results)
   }];
 }
 
diff --git a/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp b/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
index 3cc64b82950b5..fabbebef41b21 100644
--- a/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
+++ b/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
@@ -450,8 +450,16 @@ LogicalResult GPUShuffleConversion::matchAndRewrite(
     result = rewriter.create<spirv::GroupNonUniformShuffleOp>(
         loc, scope, adaptor.getValue(), adaptor.getOffset());
     break;
-  default:
-    return rewriter.notifyMatchFailure(shuffleOp, "unimplemented shuffle mode");
+  case gpu::ShuffleMode::DOWN:
+    result = rewriter.create<spirv::GroupNonUniformRotateKHROp>(
+        loc, scope, adaptor.getValue(), adaptor.getOffset(), shuffleOp.getWidth());
+    break;
+  case gpu::ShuffleMode::UP: {
+    Value offsetForShuffleDown = rewriter.create<arith::SubIOp>(loc, shuffleOp.getWidth(), adaptor.getOffset());
+    result = rewriter.create<spirv::GroupNonUniformRotateKHROp>(
+        loc, scope, adaptor.getValue(), offsetForShuffleDown, shuffleOp.getWidth());
+    break;
+  }
   }
 
   rewriter.replaceOp(shuffleOp, {result, trueVal});
diff --git a/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir b/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir
index d3d8ec0dab40f..5d7d3c81577e3 100644
--- a/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir
+++ b/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir
@@ -72,3 +72,60 @@ gpu.module @kernels {
 }
 
 }
+
+// -----
+
+module attributes {
+  gpu.container_module,
+  spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, GroupNonUniformShuffle, GroupNonUniformRotateKHR], []>,
+    #spirv.resource_limits<subgroup_size = 16>>
+} {
+
+gpu.module @kernels {
+  // CHECK-LABEL:  spirv.func @shuffle_down()
+  gpu.func @shuffle_down() kernel
+    attributes {spirv.entry_point_abi = #spirv.entry_point_abi<workgroup_size = [16, 1, 1]>} {
+    %offset = arith.constant 4 : i32
+    %width = arith.constant 16 : i32
+    %val = arith.constant 42.0 : f32
+
+    // CHECK: %[[OFFSET:.+]] = spirv.Constant 4 : i32
+    // CHECK: %[[WIDTH:.+]] = spirv.Constant 16 : i32
+    // CHECK: %[[VAL:.+]] = spirv.Constant 4.200000e+01 : f32
+    // CHECK: %{{.+}} = spirv.Constant true
+    // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup> %[[VAL]], %[[OFFSET]], cluster_size(%[[WIDTH]]) : f32, i32, i32 -> f32
+    %result, %valid = gpu.shuffle down %val, %offset, %width : f32
+    gpu.return
+  }
+}
+
+}
+
+// -----
+
+module attributes {
+  gpu.container_module,
+  spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, GroupNonUniformShuffle, GroupNonUniformRotateKHR], []>,
+    #spirv.resource_limits<subgroup_size = 16>>
+} {
+
+gpu.module @kernels {
+  // CHECK-LABEL:  spirv.func @shuffle_up()
+  gpu.func @shuffle_up() kernel
+    attributes {spirv.entry_point_abi = #spirv.entry_point_abi<workgroup_size = [16, 1, 1]>} {
+    %offset = arith.constant 4 : i32
+    %width = arith.constant 16 : i32
+    %val = arith.constant 42.0 : f32
+
+    // CHECK: %[[OFFSET:.+]] = spirv.Constant 4 : i32
+    // CHECK: %[[WIDTH:.+]] = spirv.Constant 16 : i32
+    // CHECK: %[[VAL:.+]] = spirv.Constant 4.200000e+01 : f32
+    // CHECK: %{{.+}} = spirv.Constant true
+    // CHECK: %[[DOWN_OFFSET:.+]] = spirv.Constant 12 : i32
+    // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup> %[[VAL]], %[[DOWN_OFFSET]], cluster_size(%[[WIDTH]]) : f32, i32, i32 -> f32
+    %result, %valid = gpu.shuffle up %val, %offset, %width : f32
+    gpu.return
+  }
+}
+
+}
diff --git a/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir b/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
index bf383d3837b6e..6990f2b3751f5 100644
--- a/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
+++ b/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
@@ -613,8 +613,8 @@ func.func @group_non_uniform_logical_xor(%val: i32) -> i32 {
 
 // CHECK-LABEL: @group_non_uniform_rotate_khr
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
-  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup>, %{{.+}} : f32, i32 -> f32
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta : f32, i32 -> f32
+  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup> %{{.+}} : f32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta : f32, i32 -> f32
   return %0: f32
 }
 
@@ -622,9 +622,9 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 
 // CHECK-LABEL: @group_non_uniform_rotate_khr
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
-  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Workgroup>, %{{.+}} : f32, i32, i32 -> f32
+  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Workgroup> %{{.+}} : f32, i32, i32 -> f32
   %four = spirv.Constant 4 : i32
-  %0 = spirv.GroupNonUniformRotateKHR <Workgroup>, %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Workgroup> %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
   return %0: f32
 }
 
@@ -633,7 +633,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
   %four = spirv.Constant 4 : i32
   // expected-error @+1 {{execution scope must be 'Workgroup' or 'Subgroup'}}
-  %0 = spirv.GroupNonUniformRotateKHR <Device>, %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Device> %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
   return %0: f32
 }
 
@@ -642,7 +642,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: si32) -> f32 {
   %four = spirv.Constant 4 : i32
   // expected-error @+1 {{op operand #1 must be 8/16/32/64-bit signless/unsigned integer, but got 'si32'}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%four) : f32, si32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%four) : f32, si32, i32 -> f32
   return %0: f32
 }
 
@@ -651,7 +651,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: si32) -> f32 {
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
   %four = spirv.Constant 4 : si32
   // expected-error @+1 {{op operand #2 must be 8/16/32/64-bit signless/unsigned integer, but got 'si32'}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%four) : f32, i32, si32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%four) : f32, i32, si32 -> f32
   return %0: f32
 }
 
@@ -659,7 +659,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32, %four: i32) -> f32 {
   // expected-error @+1 {{cluster size operand must come from a constant op}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
   return %0: f32
 }
 
@@ -668,6 +668,6 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32, %four: i32) -> f
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
   %five = spirv.Constant 5 : i32
   // expected-error @+1 {{cluster size operand must be a power of two}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%five) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%five) : f32, i32, i32 -> f32
   return %0: f32
 }

llvmbot · 2025-05-08T16:13:16Z

@llvm/pr-subscribers-mlir

Author: Hsiangkai Wang (Hsiangkai)

Changes

Convert

gpu.shuffle down %val, %offset, %width

to

spirv.GroupNonUniformRotateKHR <Subgroup> %val, %offset, cluster_size(%width)

Convert

gpu.shuffle up %val, %offset, %width

to

%down_offset = arith.subi %width, %offset
spirv.GroupNonUniformRotateKHR <Subgroup> %val, %down_offset, cluster_size(%width)

In addition, update the spirv.GroupNonUniformRotateKHR assembly format to be consistent with other gpu non-uniform operations.

Full diff: https://github.com/llvm/llvm-project/pull/139105.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td (+3-3)
(modified) mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp (+10-2)
(modified) mlir/test/Conversion/GPUToSPIRV/shuffle.mlir (+57)
(modified) mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir (+9-9)

diff --git a/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td b/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
index 2dd3dbd28d436..3fdaff2470cba 100644
--- a/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
+++ b/mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
@@ -1404,8 +1404,8 @@ def SPIRV_GroupNonUniformRotateKHROp : SPIRV_Op<"GroupNonUniformRotateKHR", [
 
     ```mlir
     %four = spirv.Constant 4 : i32
-    %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %value, %delta : f32, i32 -> f32
-    %1 = spirv.GroupNonUniformRotateKHR <Workgroup>, %value, %delta,
+    %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %value, %delta : f32, i32 -> f32
+    %1 = spirv.GroupNonUniformRotateKHR <Workgroup> %value, %delta,
          clustersize(%four) : f32, i32, i32 -> f32
     ```
   }];
@@ -1429,7 +1429,7 @@ def SPIRV_GroupNonUniformRotateKHROp : SPIRV_Op<"GroupNonUniformRotateKHR", [
   );
 
   let assemblyFormat = [{
-    $execution_scope `,` $value `,` $delta (`,` `cluster_size` `(` $cluster_size^ `)`)? attr-dict `:` type($value) `,` type($delta) (`,` type($cluster_size)^)? `->` type(results)
+    $execution_scope $value `,` $delta (`,` `cluster_size` `(` $cluster_size^ `)`)? attr-dict `:` type($value) `,` type($delta) (`,` type($cluster_size)^)? `->` type(results)
   }];
 }
 
diff --git a/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp b/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
index 3cc64b82950b5..fabbebef41b21 100644
--- a/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
+++ b/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
@@ -450,8 +450,16 @@ LogicalResult GPUShuffleConversion::matchAndRewrite(
     result = rewriter.create<spirv::GroupNonUniformShuffleOp>(
         loc, scope, adaptor.getValue(), adaptor.getOffset());
     break;
-  default:
-    return rewriter.notifyMatchFailure(shuffleOp, "unimplemented shuffle mode");
+  case gpu::ShuffleMode::DOWN:
+    result = rewriter.create<spirv::GroupNonUniformRotateKHROp>(
+        loc, scope, adaptor.getValue(), adaptor.getOffset(), shuffleOp.getWidth());
+    break;
+  case gpu::ShuffleMode::UP: {
+    Value offsetForShuffleDown = rewriter.create<arith::SubIOp>(loc, shuffleOp.getWidth(), adaptor.getOffset());
+    result = rewriter.create<spirv::GroupNonUniformRotateKHROp>(
+        loc, scope, adaptor.getValue(), offsetForShuffleDown, shuffleOp.getWidth());
+    break;
+  }
   }
 
   rewriter.replaceOp(shuffleOp, {result, trueVal});
diff --git a/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir b/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir
index d3d8ec0dab40f..5d7d3c81577e3 100644
--- a/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir
+++ b/mlir/test/Conversion/GPUToSPIRV/shuffle.mlir
@@ -72,3 +72,60 @@ gpu.module @kernels {
 }
 
 }
+
+// -----
+
+module attributes {
+  gpu.container_module,
+  spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, GroupNonUniformShuffle, GroupNonUniformRotateKHR], []>,
+    #spirv.resource_limits<subgroup_size = 16>>
+} {
+
+gpu.module @kernels {
+  // CHECK-LABEL:  spirv.func @shuffle_down()
+  gpu.func @shuffle_down() kernel
+    attributes {spirv.entry_point_abi = #spirv.entry_point_abi<workgroup_size = [16, 1, 1]>} {
+    %offset = arith.constant 4 : i32
+    %width = arith.constant 16 : i32
+    %val = arith.constant 42.0 : f32
+
+    // CHECK: %[[OFFSET:.+]] = spirv.Constant 4 : i32
+    // CHECK: %[[WIDTH:.+]] = spirv.Constant 16 : i32
+    // CHECK: %[[VAL:.+]] = spirv.Constant 4.200000e+01 : f32
+    // CHECK: %{{.+}} = spirv.Constant true
+    // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup> %[[VAL]], %[[OFFSET]], cluster_size(%[[WIDTH]]) : f32, i32, i32 -> f32
+    %result, %valid = gpu.shuffle down %val, %offset, %width : f32
+    gpu.return
+  }
+}
+
+}
+
+// -----
+
+module attributes {
+  gpu.container_module,
+  spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, GroupNonUniformShuffle, GroupNonUniformRotateKHR], []>,
+    #spirv.resource_limits<subgroup_size = 16>>
+} {
+
+gpu.module @kernels {
+  // CHECK-LABEL:  spirv.func @shuffle_up()
+  gpu.func @shuffle_up() kernel
+    attributes {spirv.entry_point_abi = #spirv.entry_point_abi<workgroup_size = [16, 1, 1]>} {
+    %offset = arith.constant 4 : i32
+    %width = arith.constant 16 : i32
+    %val = arith.constant 42.0 : f32
+
+    // CHECK: %[[OFFSET:.+]] = spirv.Constant 4 : i32
+    // CHECK: %[[WIDTH:.+]] = spirv.Constant 16 : i32
+    // CHECK: %[[VAL:.+]] = spirv.Constant 4.200000e+01 : f32
+    // CHECK: %{{.+}} = spirv.Constant true
+    // CHECK: %[[DOWN_OFFSET:.+]] = spirv.Constant 12 : i32
+    // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup> %[[VAL]], %[[DOWN_OFFSET]], cluster_size(%[[WIDTH]]) : f32, i32, i32 -> f32
+    %result, %valid = gpu.shuffle up %val, %offset, %width : f32
+    gpu.return
+  }
+}
+
+}
diff --git a/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir b/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
index bf383d3837b6e..6990f2b3751f5 100644
--- a/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
+++ b/mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
@@ -613,8 +613,8 @@ func.func @group_non_uniform_logical_xor(%val: i32) -> i32 {
 
 // CHECK-LABEL: @group_non_uniform_rotate_khr
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
-  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup>, %{{.+}} : f32, i32 -> f32
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta : f32, i32 -> f32
+  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Subgroup> %{{.+}} : f32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta : f32, i32 -> f32
   return %0: f32
 }
 
@@ -622,9 +622,9 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 
 // CHECK-LABEL: @group_non_uniform_rotate_khr
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
-  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Workgroup>, %{{.+}} : f32, i32, i32 -> f32
+  // CHECK: %{{.+}} = spirv.GroupNonUniformRotateKHR <Workgroup> %{{.+}} : f32, i32, i32 -> f32
   %four = spirv.Constant 4 : i32
-  %0 = spirv.GroupNonUniformRotateKHR <Workgroup>, %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Workgroup> %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
   return %0: f32
 }
 
@@ -633,7 +633,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
   %four = spirv.Constant 4 : i32
   // expected-error @+1 {{execution scope must be 'Workgroup' or 'Subgroup'}}
-  %0 = spirv.GroupNonUniformRotateKHR <Device>, %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Device> %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
   return %0: f32
 }
 
@@ -642,7 +642,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: si32) -> f32 {
   %four = spirv.Constant 4 : i32
   // expected-error @+1 {{op operand #1 must be 8/16/32/64-bit signless/unsigned integer, but got 'si32'}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%four) : f32, si32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%four) : f32, si32, i32 -> f32
   return %0: f32
 }
 
@@ -651,7 +651,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: si32) -> f32 {
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
   %four = spirv.Constant 4 : si32
   // expected-error @+1 {{op operand #2 must be 8/16/32/64-bit signless/unsigned integer, but got 'si32'}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%four) : f32, i32, si32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%four) : f32, i32, si32 -> f32
   return %0: f32
 }
 
@@ -659,7 +659,7 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
 
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32, %four: i32) -> f32 {
   // expected-error @+1 {{cluster size operand must come from a constant op}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%four) : f32, i32, i32 -> f32
   return %0: f32
 }
 
@@ -668,6 +668,6 @@ func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32, %four: i32) -> f
 func.func @group_non_uniform_rotate_khr(%val: f32, %delta: i32) -> f32 {
   %five = spirv.Constant 5 : i32
   // expected-error @+1 {{cluster size operand must be a power of two}}
-  %0 = spirv.GroupNonUniformRotateKHR <Subgroup>, %val, %delta, cluster_size(%five) : f32, i32, i32 -> f32
+  %0 = spirv.GroupNonUniformRotateKHR <Subgroup> %val, %delta, cluster_size(%five) : f32, i32, i32 -> f32
   return %0: f32
 }

github-actions · 2025-05-08T16:15:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

IgWod-IMG · 2025-05-08T16:48:47Z

In addition, update the spirv.GroupNonUniformRotateKHR assembly format to be consistent with other gpu non-uniform operations.

Apologies for being pedantic, but it'd probably be better to submit this as a separate PR. It feels a bit out of place to submit it with this change. But I'm still relatively new to the project, so I'm not sure what are the (written and unwritten) rules around it :)

Hsiangkai · 2025-05-09T08:45:04Z

In addition, update the spirv.GroupNonUniformRotateKHR assembly format to be consistent with other gpu non-uniform operations.

Apologies for being pedantic, but it'd probably be better to submit this as a separate PR. It feels a bit out of place to submit it with this change. But I'm still relatively new to the project, so I'm not sure what are the (written and unwritten) rules around it :)

Make sense. I created another PR for reformatting. #139224

mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp

Convert gpu.shuffle down %val, %offset, %width to spirv.GroupNonUniformRotateKHR <Subgroup> %val, %offset, cluster_size(%width) Convert gpu.shuffle up %val, %offset, %width to %down_offset = arith.subi %width, %offset spirv.GroupNonUniformRotateKHR <Subgroup> %val, %down_offset, cluster_size(%width)

There is no such semantic in SPIRV OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown. In addition, there is no such semantic in NVVM shfl intrinsics. Refer to NVVM IR spec https://docs.nvidia.com/cuda/archive/12.2.1/nvvm-ir-spec/index.html#data-movement "If the computed source lane index j is in range, the returned i32 value will be the value of %a from lane j; otherwise, it will be the the value of %a from the current thread. If the thread corresponding to lane j is inactive, then the returned i32 value is undefined."

…ndID

Hsiangkai requested review from antiagainst and kuhar as code owners May 8, 2025 16:12

llvmbot added mlir:gpu mlir:spirv mlir labels May 8, 2025

Hsiangkai force-pushed the gpu-shuffle-up-down branch from dbc412f to 8a5b9ac Compare May 9, 2025 09:03

Hsiangkai requested a review from krzysz00 May 20, 2025 09:18

Hsiangkai force-pushed the gpu-shuffle-up-down branch from 8a5b9ac to 86c6037 Compare May 21, 2025 12:57

antiagainst requested changes May 25, 2025

View reviewed changes

mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp Outdated Show resolved Hide resolved

kuhar reviewed Jun 2, 2025

View reviewed changes

mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp Outdated Show resolved Hide resolved

kuhar requested changes Jun 4, 2025

View reviewed changes

Hsiangkai changed the title ~~[mlir][gpu][spirv] Add patterns for gpu.shuffle up/down~~ [mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down Jun 4, 2025

kuhar reviewed Jun 5, 2025

View reviewed changes

mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp Outdated Show resolved Hide resolved

Hsiangkai added 8 commits June 6, 2025 11:12

The width argument cannot exceed the subgroup limit.

4f9e602

fix typo

2a3f62c

remove test for gpu.shuffle width != subgroup_size limit

cfe6c2d

Refine description and set 'valid' flag according to the resulting la…

544ae11

…ndID

refactor and update wording

1ae89dc

Use gpu::LaneIdOp

6c7be29

Hsiangkai force-pushed the gpu-shuffle-up-down branch from 3369f34 to 6c7be29 Compare June 6, 2025 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down #139105

[mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down #139105

Hsiangkai commented May 8, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 8, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 8, 2025

Uh oh!

github-actions bot commented May 8, 2025 •

edited

Loading

Uh oh!

IgWod-IMG commented May 8, 2025

Uh oh!

Hsiangkai commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down #139105

Are you sure you want to change the base?

[mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down #139105

Conversation

Hsiangkai commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 8, 2025

Uh oh!

github-actions bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IgWod-IMG commented May 8, 2025

Uh oh!

Hsiangkai commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hsiangkai commented May 8, 2025 •

edited

Loading

llvmbot commented May 8, 2025 •

edited

Loading

github-actions bot commented May 8, 2025 •

edited

Loading