Skip to content

[mlir][nvvm] Introduce nvvm.barrier OP #81487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 14, 2024
Merged

[mlir][nvvm] Introduce nvvm.barrier OP #81487

merged 3 commits into from
Feb 14, 2024

Conversation

grypp
Copy link
Member

@grypp grypp commented Feb 12, 2024

This PR that introduces the nvvm.barrier OP to the NVVM dialect. Currently, NVVM only supports the nvvm.barrier0, which synchronizes all threads using barrier resource 0.

The new nvvm.barrier has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model.

I think we can deprecate nvvm.barrier0 in favor of the more generic nvvm.barrier.

// Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA)
nvvm.barrier

// Synchronize all threads using the 3rd barrier resource.
nvvm.barrier id = 3

// Synchronize %numberOfThreads threads using the 3rd barrier resource.
nvvm.barrier id = 3 number_of_threads = %numberOfThreads

This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0.

The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model.

IMHO, the goal is to deprecate `nvvm.barrier0` in favor of the more generic and powerful `nvvm.barrier`.

```
// Equivalent to nvvm.barrier0 or __syncthreads in CUDA.
nvvm.barrier

// Synchronize all threads using the 3rd barrier resource.
nvvm.barrier resource = 3

// Synchronize %numberOfThreads threads using the default (first) barrier resource.
nvvm.barrier number_of_threads = %numberOfThreads

// Synchronize %numberOfThreads threads using the 3rd barrier resource.
nvvm.barrier resource = 3 number_of_threads = %numberOfThreads
```
@grypp
Copy link
Member Author

grypp commented Feb 12, 2024

@durga4github could you please review this?

@llvmbot
Copy link
Member

llvmbot commented Feb 12, 2024

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Guray Ozen (grypp)

Changes

This PR that introduces the nvvm.barrier OP to the NVVM dialect. Currently, NVVM only supports the nvvm.barrier0, which synchronizes all threads using barrier resource 0.

The new nvvm.barrier has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model.

I think we can deprecate nvvm.barrier0 in favor of the more generic nvvm.barrier.

// Equivalent to nvvm.barrier0 or __syncthreads in CUDA.
nvvm.barrier

// Synchronize all threads using the 3rd barrier resource.
nvvm.barrier resource = 3

// Synchronize %numberOfThreads threads using the default (first) barrier resource.
nvvm.barrier number_of_threads = %numberOfThreads

// Synchronize %numberOfThreads threads using the 3rd barrier resource.
nvvm.barrier resource = 3 number_of_threads = %numberOfThreads

Full diff: https://github.com/llvm/llvm-project/pull/81487.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+20)
  • (modified) mlir/test/Dialect/LLVMIR/nvvm.mlir (+14)
  • (modified) mlir/test/Target/LLVMIR/nvvmir.mlir (+14)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 37e525a139d4ad..1369ff1988037c 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -377,6 +377,26 @@ def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
   let assemblyFormat = "attr-dict";
 }
 
+def NVVM_BarrierOp : NVVM_Op<"barrier"> {
+  let arguments = (ins     
+    DefaultValuedAttr<ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<15>]>, "0">:$barrierResource,
+    Optional<I32>:$numberOfThreads);
+  string llvmBuilder = [{
+    auto syncThreads = builder.getInt32($barrierResource);
+    if ($numberOfThreads) {
+      createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier,
+                { syncThreads, $numberOfThreads});
+    } else {
+      if($barrierResource == 0)
+        createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier0);
+      else
+        createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier_n,
+                { syncThreads});
+    }
+  }];
+  let assemblyFormat = "(`resource` `=` $barrierResource^)? (`number_of_threads` `=` $numberOfThreads^)? attr-dict";
+}
+
 def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
diff --git a/mlir/test/Dialect/LLVMIR/nvvm.mlir b/mlir/test/Dialect/LLVMIR/nvvm.mlir
index ce483ddab22a0e..297712a47e7830 100644
--- a/mlir/test/Dialect/LLVMIR/nvvm.mlir
+++ b/mlir/test/Dialect/LLVMIR/nvvm.mlir
@@ -43,6 +43,20 @@ func.func @llvm_nvvm_barrier0() {
   llvm.return
 }
 
+// CHECK-LABEL: llvm.func @llvm_nvvm_barrier
+// CHECK-SAME: (%[[barId:.*]]: i32)
+llvm.func @llvm_nvvm_barrier(%numberOfThreads : i32) {
+  // CHECK: nvvm.barrier
+  nvvm.barrier 
+  // CHECK: nvvm.barrier resource = 3
+  nvvm.barrier resource = 3
+  // CHECK: nvvm.barrier number_of_threads = %[[barId]]
+  nvvm.barrier number_of_threads = %numberOfThreads
+  // CHECK: nvvm.barrier resource = 4 number_of_threads = %[[barId]]
+  nvvm.barrier resource = 4 number_of_threads = %numberOfThreads
+  llvm.return
+}
+
 // CHECK-LABEL: @llvm_nvvm_cluster_arrive
 func.func @llvm_nvvm_cluster_arrive() {
   // CHECK: nvvm.cluster.arrive
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 49f9426daabc21..17eba33fdce71b 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -80,6 +80,20 @@ llvm.func @llvm_nvvm_barrier0() {
   llvm.return
 }
 
+// CHECK-LABEL: @llvm_nvvm_barrier(
+// CHECK-SAME: i32 %[[barId:.*]])
+llvm.func @llvm_nvvm_barrier(%numberOfThreads : i32) {
+  // CHECK: call void @llvm.nvvm.barrier0()
+  nvvm.barrier 
+  // CHECK: call void @llvm.nvvm.barrier.n(i32 3)
+  nvvm.barrier resource = 3
+  // CHECK: call void @llvm.nvvm.barrier(i32 0, i32 %[[barId]])
+  nvvm.barrier number_of_threads = %numberOfThreads
+  // CHECK: call void @llvm.nvvm.barrier(i32 4, i32 %[[barId]])
+  nvvm.barrier resource = 4 number_of_threads = %numberOfThreads
+  llvm.return
+}
+
 // CHECK-LABEL: @llvm_nvvm_cluster_arrive
 llvm.func @llvm_nvvm_cluster_arrive() {
   // CHECK: call void @llvm.nvvm.barrier.cluster.arrive()

Copy link

github-actions bot commented Feb 13, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@durga4github durga4github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean now, Thank you for the naming update!

@grypp grypp merged commit b5d694b into llvm:main Feb 14, 2024
@grypp grypp deleted the nvvm-barrier branch February 14, 2024 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants