-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[NVPTX] Add Intrinsics for applypriority.* #127989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Abhilash Majumder (abhilash1910) Changes[NVPTX] Add ApplyPriority intrinsics This PR adds applypriority.* intrinsics with relevant eviction priorities.
For more information, refer to the PTX ISA Full diff: https://github.com/llvm/llvm-project/pull/127989.diff 4 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 675b458c41e7b..61ae07816bdfd 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -630,6 +630,30 @@ uses and eviction priority which can be accessed by the '``.level::eviction_prio
For more information, refer to the PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu>`_.
+'``llvm.nvvm.applypriority.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.applypriority.global.L2.evict.normal(ptr addrspace(1) %global_ptr, i64 %size)
+ declare void @llvm.nvvm.applypriority.L2.evict.normal(ptr %ptr, i64 %size)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.applypriority.*``' applies the cache eviction priority specified by the
+.level::eviction_priority qualifier to the address range [a..a+size) in the specified cache
+level. If no state space is specified then Generic Addressing is used. If the specified address
+does not fall within the address window of .global state space then the behavior is undefined.
+The operand size is an integer constant that specifies the amount of data, in bytes, in the specified cache
+level on which the priority is to be applied. The only supported value for the size operand is 128.
+
+For more information, refer to the PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-applypriority>`_.
+
'``llvm.nvvm.cp.async.bulk.tensor.g2s.tile.[1-5]d``'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index c32bf0318b5d6..eff860bc3a850 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5043,6 +5043,16 @@ def int_nvvm_prefetch_global_L2_evict_last: Intrinsic<[], [llvm_global_ptr_ty],
def int_nvvm_prefetchu_L1 : Intrinsic<[], [llvm_ptr_ty],
[IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>]>;
+def int_nvvm_applypriority_global_L2_evict_normal
+ : Intrinsic<[], [llvm_global_ptr_ty, llvm_i64_ty],
+ [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>,
+ ImmArg<ArgIndex<1>>]>;
+
+def int_nvvm_applypriority_L2_evict_normal
+ : Intrinsic<[], [llvm_ptr_ty, llvm_i64_ty],
+ [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>,
+ ImmArg<ArgIndex<1>>]>;
+
// Intrinsics for Bulk Copy using TMA (non-tensor)
// From Global to Shared Cluster
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index ed7963f35a7c7..78e0621fb52d9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -789,6 +789,22 @@ def PREFETCH_GLOBAL_L2_EVICT_LAST : NVPTXInst<(outs), (ins Int64Regs:$addr),
def PREFETCHU_L1 : PREFETCH_INTRS<"prefetchu.L1">;
+//Applypriority intrinsics
+multiclass APPLYPRIORITY_L2_INTRS<string addr> {
+ defvar InstName = "applypriority."
+ # !if(!eq(addr, ""), "", addr # ".")
+ # "L2::evict_normal";
+
+ def APPLYPRIORITY_L2 : NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$size),
+ InstName # " [$addr], $size;",
+ [(!cast<Intrinsic>("int_nvvm_" # !subst("::", "_", !subst(".", "_", InstName)))
+ i64:$addr, i64:$size)]>,
+ Requires<[hasPTX<80>, hasSM<90>]>;
+}
+
+defm APPLYPRIORITY_L2_EVICT_NORMAL : APPLYPRIORITY_L2_INTRS<"">;
+defm APPLYPRIORITY_GLOBAL_L2_EVICT_NORMAL : APPLYPRIORITY_L2_INTRS<"global">;
+
//-----------------------------------
// MBarrier Functions
//-----------------------------------
diff --git a/llvm/test/CodeGen/NVPTX/applypriority.ll b/llvm/test/CodeGen/NVPTX/applypriority.ll
new file mode 100644
index 0000000000000..51998f4d850c2
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/applypriority.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| FileCheck --check-prefixes=CHECK-PTX64 %s
+; RUN: %if ptxas-12.3 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+declare void @llvm.nvvm.applypriority.global.L2.evict.normal(ptr addrspace(1) %global_ptr, i64 %size)
+declare void @llvm.nvvm.applypriority.L2.evict.normal(ptr %ptr, i64 %size)
+
+define void @applypriority_global_L2(ptr addrspace(1) %global_ptr, i64 %size) {
+; CHECK-PTX64-LABEL: applypriority_global_L2(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.u64 %rd1, [applypriority_global_L2_param_0];
+; CHECK-PTX64-NEXT: applypriority.global.L2::evict_normal [%rd1], 128;
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.applypriority.global.L2.evict.normal(ptr addrspace(1) %global_ptr, i64 128)
+ ret void
+}
+
+define void @applypriority_L2(ptr %ptr, i64 %size) {
+; CHECK-PTX64-LABEL: applypriority_L2(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.u64 %rd1, [applypriority_L2_param_0];
+; CHECK-PTX64-NEXT: applypriority.L2::evict_normal [%rd1], 128;
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.applypriority.L2.evict.normal(ptr %ptr, i64 128)
+ ret void
+}
|
@llvm/pr-subscribers-llvm-ir Author: Abhilash Majumder (abhilash1910) Changes[NVPTX] Add ApplyPriority intrinsics This PR adds applypriority.* intrinsics with relevant eviction priorities.
For more information, refer to the PTX ISA Full diff: https://github.com/llvm/llvm-project/pull/127989.diff 4 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 675b458c41e7b..61ae07816bdfd 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -630,6 +630,30 @@ uses and eviction priority which can be accessed by the '``.level::eviction_prio
For more information, refer to the PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu>`_.
+'``llvm.nvvm.applypriority.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.applypriority.global.L2.evict.normal(ptr addrspace(1) %global_ptr, i64 %size)
+ declare void @llvm.nvvm.applypriority.L2.evict.normal(ptr %ptr, i64 %size)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.applypriority.*``' applies the cache eviction priority specified by the
+.level::eviction_priority qualifier to the address range [a..a+size) in the specified cache
+level. If no state space is specified then Generic Addressing is used. If the specified address
+does not fall within the address window of .global state space then the behavior is undefined.
+The operand size is an integer constant that specifies the amount of data, in bytes, in the specified cache
+level on which the priority is to be applied. The only supported value for the size operand is 128.
+
+For more information, refer to the PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-applypriority>`_.
+
'``llvm.nvvm.cp.async.bulk.tensor.g2s.tile.[1-5]d``'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index c32bf0318b5d6..eff860bc3a850 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5043,6 +5043,16 @@ def int_nvvm_prefetch_global_L2_evict_last: Intrinsic<[], [llvm_global_ptr_ty],
def int_nvvm_prefetchu_L1 : Intrinsic<[], [llvm_ptr_ty],
[IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>]>;
+def int_nvvm_applypriority_global_L2_evict_normal
+ : Intrinsic<[], [llvm_global_ptr_ty, llvm_i64_ty],
+ [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>,
+ ImmArg<ArgIndex<1>>]>;
+
+def int_nvvm_applypriority_L2_evict_normal
+ : Intrinsic<[], [llvm_ptr_ty, llvm_i64_ty],
+ [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>,
+ ImmArg<ArgIndex<1>>]>;
+
// Intrinsics for Bulk Copy using TMA (non-tensor)
// From Global to Shared Cluster
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index ed7963f35a7c7..78e0621fb52d9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -789,6 +789,22 @@ def PREFETCH_GLOBAL_L2_EVICT_LAST : NVPTXInst<(outs), (ins Int64Regs:$addr),
def PREFETCHU_L1 : PREFETCH_INTRS<"prefetchu.L1">;
+//Applypriority intrinsics
+multiclass APPLYPRIORITY_L2_INTRS<string addr> {
+ defvar InstName = "applypriority."
+ # !if(!eq(addr, ""), "", addr # ".")
+ # "L2::evict_normal";
+
+ def APPLYPRIORITY_L2 : NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$size),
+ InstName # " [$addr], $size;",
+ [(!cast<Intrinsic>("int_nvvm_" # !subst("::", "_", !subst(".", "_", InstName)))
+ i64:$addr, i64:$size)]>,
+ Requires<[hasPTX<80>, hasSM<90>]>;
+}
+
+defm APPLYPRIORITY_L2_EVICT_NORMAL : APPLYPRIORITY_L2_INTRS<"">;
+defm APPLYPRIORITY_GLOBAL_L2_EVICT_NORMAL : APPLYPRIORITY_L2_INTRS<"global">;
+
//-----------------------------------
// MBarrier Functions
//-----------------------------------
diff --git a/llvm/test/CodeGen/NVPTX/applypriority.ll b/llvm/test/CodeGen/NVPTX/applypriority.ll
new file mode 100644
index 0000000000000..51998f4d850c2
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/applypriority.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| FileCheck --check-prefixes=CHECK-PTX64 %s
+; RUN: %if ptxas-12.3 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+declare void @llvm.nvvm.applypriority.global.L2.evict.normal(ptr addrspace(1) %global_ptr, i64 %size)
+declare void @llvm.nvvm.applypriority.L2.evict.normal(ptr %ptr, i64 %size)
+
+define void @applypriority_global_L2(ptr addrspace(1) %global_ptr, i64 %size) {
+; CHECK-PTX64-LABEL: applypriority_global_L2(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.u64 %rd1, [applypriority_global_L2_param_0];
+; CHECK-PTX64-NEXT: applypriority.global.L2::evict_normal [%rd1], 128;
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.applypriority.global.L2.evict.normal(ptr addrspace(1) %global_ptr, i64 128)
+ ret void
+}
+
+define void @applypriority_L2(ptr %ptr, i64 %size) {
+; CHECK-PTX64-LABEL: applypriority_L2(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.u64 %rd1, [applypriority_L2_param_0];
+; CHECK-PTX64-NEXT: applypriority.L2::evict_normal [%rd1], 128;
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.applypriority.L2.evict.normal(ptr %ptr, i64 128)
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest revision looks good to me.
@durga4github could you please help to merge ? Thanks |
\[NVPTX\] Add ApplyPriority intrinsics This PR adds applypriority.\* intrinsics with relevant eviction priorities. * The lowering is handled from nvvm to nvptx tablegen directly. * Lit tests are added as part of applypriority.ll * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst. For more information, refer to the PTX ISA `<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-applypriority>`_. --------- Co-authored-by: abmajumder <abmajumder@nvidia.com>
[NVPTX] Add ApplyPriority intrinsics
This PR adds applypriority.* intrinsics with relevant eviction priorities.
For more information, refer to the PTX ISA
<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-applypriority>
_.