-
Notifications
You must be signed in to change notification settings - Fork 14k
[mlir][nvvm] Introduce nvvm.barrier
OP
#81487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. IMHO, the goal is to deprecate `nvvm.barrier0` in favor of the more generic and powerful `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 or __syncthreads in CUDA. nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier resource = 3 // Synchronize %numberOfThreads threads using the default (first) barrier resource. nvvm.barrier number_of_threads = %numberOfThreads // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier resource = 3 number_of_threads = %numberOfThreads ```
@durga4github could you please review this? |
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir Author: Guray Ozen (grypp) ChangesThis PR that introduces the The new I think we can deprecate
Full diff: https://github.com/llvm/llvm-project/pull/81487.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 37e525a139d4ad..1369ff1988037c 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -377,6 +377,26 @@ def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
let assemblyFormat = "attr-dict";
}
+def NVVM_BarrierOp : NVVM_Op<"barrier"> {
+ let arguments = (ins
+ DefaultValuedAttr<ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<15>]>, "0">:$barrierResource,
+ Optional<I32>:$numberOfThreads);
+ string llvmBuilder = [{
+ auto syncThreads = builder.getInt32($barrierResource);
+ if ($numberOfThreads) {
+ createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier,
+ { syncThreads, $numberOfThreads});
+ } else {
+ if($barrierResource == 0)
+ createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier0);
+ else
+ createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier_n,
+ { syncThreads});
+ }
+ }];
+ let assemblyFormat = "(`resource` `=` $barrierResource^)? (`number_of_threads` `=` $numberOfThreads^)? attr-dict";
+}
+
def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
diff --git a/mlir/test/Dialect/LLVMIR/nvvm.mlir b/mlir/test/Dialect/LLVMIR/nvvm.mlir
index ce483ddab22a0e..297712a47e7830 100644
--- a/mlir/test/Dialect/LLVMIR/nvvm.mlir
+++ b/mlir/test/Dialect/LLVMIR/nvvm.mlir
@@ -43,6 +43,20 @@ func.func @llvm_nvvm_barrier0() {
llvm.return
}
+// CHECK-LABEL: llvm.func @llvm_nvvm_barrier
+// CHECK-SAME: (%[[barId:.*]]: i32)
+llvm.func @llvm_nvvm_barrier(%numberOfThreads : i32) {
+ // CHECK: nvvm.barrier
+ nvvm.barrier
+ // CHECK: nvvm.barrier resource = 3
+ nvvm.barrier resource = 3
+ // CHECK: nvvm.barrier number_of_threads = %[[barId]]
+ nvvm.barrier number_of_threads = %numberOfThreads
+ // CHECK: nvvm.barrier resource = 4 number_of_threads = %[[barId]]
+ nvvm.barrier resource = 4 number_of_threads = %numberOfThreads
+ llvm.return
+}
+
// CHECK-LABEL: @llvm_nvvm_cluster_arrive
func.func @llvm_nvvm_cluster_arrive() {
// CHECK: nvvm.cluster.arrive
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 49f9426daabc21..17eba33fdce71b 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -80,6 +80,20 @@ llvm.func @llvm_nvvm_barrier0() {
llvm.return
}
+// CHECK-LABEL: @llvm_nvvm_barrier(
+// CHECK-SAME: i32 %[[barId:.*]])
+llvm.func @llvm_nvvm_barrier(%numberOfThreads : i32) {
+ // CHECK: call void @llvm.nvvm.barrier0()
+ nvvm.barrier
+ // CHECK: call void @llvm.nvvm.barrier.n(i32 3)
+ nvvm.barrier resource = 3
+ // CHECK: call void @llvm.nvvm.barrier(i32 0, i32 %[[barId]])
+ nvvm.barrier number_of_threads = %numberOfThreads
+ // CHECK: call void @llvm.nvvm.barrier(i32 4, i32 %[[barId]])
+ nvvm.barrier resource = 4 number_of_threads = %numberOfThreads
+ llvm.return
+}
+
// CHECK-LABEL: @llvm_nvvm_cluster_arrive
llvm.func @llvm_nvvm_cluster_arrive() {
// CHECK: call void @llvm.nvvm.barrier.cluster.arrive()
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks clean now, Thank you for the naming update!
This PR that introduces the
nvvm.barrier
OP to the NVVM dialect. Currently, NVVM only supports thenvvm.barrier0
, which synchronizes all threads using barrier resource 0.The new
nvvm.barrier
has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model.I think we can deprecate
nvvm.barrier0
in favor of the more genericnvvm.barrier
.