Skip to content

[NVPTX] Add SM versions for 101 and 120 #124155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

durga4github
Copy link
Contributor

This patch adds SM and PTX versions for SM
101, 120 and their arch-accelerated variants.

All these are supported in cuda-12.8.
sm120/120a requires ptx8.7 and the rest require ptx8.6.

This patch adds SM and PTX versions for SM
101, 120 and their arch-accelerated variants.

All these are supported in cuda-12.8.
sm120/a requires ptx8.7 and the rest require ptx8.6.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Durgadoss R (durga4github)

Changes

This patch adds SM and PTX versions for SM
101, 120 and their arch-accelerated variants.

All these are supported in cuda-12.8.
sm120/120a requires ptx8.7 and the rest require ptx8.6.


Full diff: https://github.com/llvm/llvm-project/pull/124155.diff

3 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTX.td (+9-2)
  • (modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+3)
  • (modified) llvm/test/CodeGen/NVPTX/sm-version.ll (+24)
diff --git a/llvm/lib/Target/NVPTX/NVPTX.td b/llvm/lib/Target/NVPTX/NVPTX.td
index 3ca8b4d294079c..5467ae011a2081 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.td
+++ b/llvm/lib/Target/NVPTX/NVPTX.td
@@ -35,15 +35,18 @@ class FeaturePTX<int version>:
                     "Use PTX version " # version>;
 
 foreach sm = [20, 21, 30, 32, 35, 37, 50, 52, 53,
-              60, 61, 62, 70, 72, 75, 80, 86, 87, 89, 90, 100] in
+              60, 61, 62, 70, 72, 75, 80, 86, 87,
+              89, 90, 100, 101, 120] in
   def SM#sm: FeatureSM<""#sm, !mul(sm, 10)>;
 
 def SM90a: FeatureSM<"90a", 901>;
 def SM100a: FeatureSM<"100a", 1001>;
+def SM101a: FeatureSM<"101a", 1011>;
+def SM120a: FeatureSM<"120a", 1201>;
 
 foreach version = [32, 40, 41, 42, 43, 50, 60, 61, 62, 63, 64, 65,
                    70, 71, 72, 73, 74, 75, 76, 77, 78,
-                   80, 81, 82, 83, 84, 85, 86] in
+                   80, 81, 82, 83, 84, 85, 86, 87] in
   def PTX#version: FeaturePTX<version>;
 
 //===----------------------------------------------------------------------===//
@@ -76,6 +79,10 @@ def : Proc<"sm_90", [SM90, PTX78]>;
 def : Proc<"sm_90a", [SM90a, PTX80]>;
 def : Proc<"sm_100", [SM100, PTX86]>;
 def : Proc<"sm_100a", [SM100a, PTX86]>;
+def : Proc<"sm_101", [SM101, PTX86]>;
+def : Proc<"sm_101a", [SM101a, PTX86]>;
+def : Proc<"sm_120", [SM120, PTX87]>;
+def : Proc<"sm_120a", [SM120a, PTX87]>;
 
 def NVPTXInstrInfo : InstrInfo {
 }
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index a076fde8ee7676..f17799c1300153 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -172,6 +172,9 @@ class hasSM<int version>: Predicate<"Subtarget->getSmVersion() >= " # version>;
 
 // Explicit records for arch-accelerated SM versions
 def hasSM90a : Predicate<"Subtarget->getFullSmVersion() == 901">;
+def hasSM100a : Predicate<"Subtarget->getFullSmVersion() == 1001">;
+def hasSM101a : Predicate<"Subtarget->getFullSmVersion() == 1011">;
+def hasSM120a : Predicate<"Subtarget->getFullSmVersion() == 1201">;
 
 // non-sync shfl instructions are not available on sm_70+ in PTX6.4+
 def hasSHFL : Predicate<"!(Subtarget->getSmVersion() >= 70"
diff --git a/llvm/test/CodeGen/NVPTX/sm-version.ll b/llvm/test/CodeGen/NVPTX/sm-version.ll
index 0e37d6e4b0d87f..ce9a1b1b161dce 100644
--- a/llvm/test/CodeGen/NVPTX/sm-version.ll
+++ b/llvm/test/CodeGen/NVPTX/sm-version.ll
@@ -16,6 +16,12 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_86 | FileCheck %s --check-prefix=SM86
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_90 | FileCheck %s --check-prefix=SM90
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_90a | FileCheck %s --check-prefix=SM90a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_100 | FileCheck %s --check-prefix=SM100
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_100a | FileCheck %s --check-prefix=SM100a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_101 | FileCheck %s --check-prefix=SM101
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_101a | FileCheck %s --check-prefix=SM101a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_120 | FileCheck %s --check-prefix=SM120
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_120a | FileCheck %s --check-prefix=SM120a
 
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s --check-prefix=SM20
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_21 | FileCheck %s --check-prefix=SM21
@@ -35,6 +41,12 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_86 | FileCheck %s --check-prefix=SM86
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 | FileCheck %s --check-prefix=SM90
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90a | FileCheck %s --check-prefix=SM90a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100 | FileCheck %s --check-prefix=SM100
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a | FileCheck %s --check-prefix=SM100a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101 | FileCheck %s --check-prefix=SM101
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a | FileCheck %s --check-prefix=SM101a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120 | FileCheck %s --check-prefix=SM120
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a | FileCheck %s --check-prefix=SM120a
 
 ; SM20: .version 3.2
 ; SM21: .version 3.2
@@ -54,6 +66,12 @@
 ; SM86: .version 7.1
 ; SM90: .version 7.8
 ; SM90a: .version 8.0
+; SM100: .version 8.6
+; SM100a: .version 8.6
+; SM101: .version 8.6
+; SM101a: .version 8.6
+; SM120: .version 8.7
+; SM120a: .version 8.7
 
 ; SM20: .target sm_20
 ; SM21: .target sm_21
@@ -73,3 +91,9 @@
 ; SM86: .target sm_86
 ; SM90: .target sm_90
 ; SM90a: .target sm_90a
+; SM100: .target sm_100
+; SM100a: .target sm_100a
+; SM101: .target sm_101
+; SM101a: .target sm_101a
+; SM120: .target sm_120
+; SM120a: .target sm_120a

@durga4github durga4github merged commit 965ff7f into llvm:main Jan 24, 2025
10 checks passed
@durga4github durga4github deleted the durgadossr/nvptx_add_sm101a branch January 24, 2025 11:09
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 24, 2025

LLVM Buildbot has detected a new failure on builder flang-aarch64-dylib running on linaro-flang-aarch64-dylib while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/50/builds/9352

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
584.640 [1605/1/5215] Building CXX object tools/mlir/lib/Dialect/IRDL/CMakeFiles/obj.MLIRIRDL.dir/IR/IRDL.cpp.o
584.787 [1604/1/5216] Building CXX object tools/mlir/lib/Dialect/IRDL/CMakeFiles/obj.MLIRIRDL.dir/IRDLVerifiers.cpp.o
585.051 [1603/1/5217] Building CXX object tools/mlir/lib/Dialect/Linalg/TransformOps/CMakeFiles/obj.MLIRLinalgTransformOps.dir/GPUHeuristics.cpp.o
585.237 [1602/1/5218] Building CXX object tools/mlir/lib/Dialect/Linalg/TransformOps/CMakeFiles/obj.MLIRLinalgTransformOps.dir/DialectExtension.cpp.o
585.336 [1601/1/5219] Building CXX object tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIArith.dir/Arith.cpp.o
585.500 [1600/1/5220] Building CXX object tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalgDialect.dir/LinalgInterfaces.cpp.o
585.875 [1599/1/5221] Building CXX object tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalgDialect.dir/ValueBoundsOpInterfaceImpl.cpp.o
586.053 [1598/1/5222] Building CXX object tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalgDialect.dir/LinalgDialect.cpp.o
586.259 [1597/1/5223] Building CXX object tools/mlir/lib/Dialect/Linalg/IR/CMakeFiles/obj.MLIRLinalgDialect.dir/LinalgOps.cpp.o
595.677 [1596/1/5224] Building CXX object tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestSymbolUses.cpp.o
FAILED: tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestSymbolUses.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_INCLUDE_TESTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR/../Dialect/Test -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestSymbolUses.cpp.o -MF tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestSymbolUses.cpp.o.d -o tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestSymbolUses.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestSymbolUses.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestSymbolUses.cpp:9:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 28, 2025
Imported from GitHub PR #21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74 by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21822 from openxla:devel/sm100a 267cf74
PiperOrigin-RevId: 720655796
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 28, 2025
Imported from GitHub PR openxla/xla#21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74a084c933e532a622da2485befdc47f8ce by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21822 from openxla:devel/sm100a 267cf74a084c933e532a622da2485befdc47f8ce
PiperOrigin-RevId: 720655796
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
Imported from GitHub PR #21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74 by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21822 from openxla:devel/sm100a 267cf74
PiperOrigin-RevId: 720655796
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 29, 2025
Imported from GitHub PR openxla/xla#21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74a084c933e532a622da2485befdc47f8ce by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21822 from openxla:devel/sm100a 267cf74a084c933e532a622da2485befdc47f8ce
PiperOrigin-RevId: 720655796
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
Imported from GitHub PR #21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74 by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21822 from openxla:devel/sm100a 267cf74
PiperOrigin-RevId: 720655796
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
Imported from GitHub PR #21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74 by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

COPYBARA_INTEGRATE_REVIEW=#21822 from openxla:devel/sm100a 267cf74
PiperOrigin-RevId: 720806648
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 29, 2025
Imported from GitHub PR openxla/xla#21822

Created `ShouldUsePtxExtension` helper for the extension suffix (this will also be used for sm120, etc).

CUDA 12.8 was recently released, which supports PTX 8.7, but that is not supported by the integrated LLVM (support added in llvm/llvm-project#124155), so leaving the association with PTX 8.6 - this doesn't raise warnings during compilation.

Copybara import of the project:

--
267cf74a084c933e532a622da2485befdc47f8ce by Sergey Kozub <skozub@nvidia.com>:

Add support for SM100a architecture (Blackwell)

Merging this change closes #21822

PiperOrigin-RevId: 720806648
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR #22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

FUTURE_COPYBARA_INTEGRATE_REVIEW=#22029 from openxla:devel/sm120a be59b7a
PiperOrigin-RevId: 721049239
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR #22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

FUTURE_COPYBARA_INTEGRATE_REVIEW=#22029 from openxla:devel/sm120a be59b7a
PiperOrigin-RevId: 721049239
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR openxla/xla#22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a51721637d880207e7adb69a18c3a92bea by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#22029 from openxla:devel/sm120a be59b7a51721637d880207e7adb69a18c3a92bea
PiperOrigin-RevId: 721049239
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR #22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

FUTURE_COPYBARA_INTEGRATE_REVIEW=#22029 from openxla:devel/sm120a be59b7a
PiperOrigin-RevId: 721049239
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR openxla/xla#22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a51721637d880207e7adb69a18c3a92bea by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#22029 from openxla:devel/sm120a be59b7a51721637d880207e7adb69a18c3a92bea
PiperOrigin-RevId: 721049239
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR #22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

COPYBARA_INTEGRATE_REVIEW=#22029 from openxla:devel/sm120a be59b7a
PiperOrigin-RevId: 721088886
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 29, 2025
…(Blackwell)

Imported from GitHub PR openxla/xla#22029

In addition to SM120a, also add SM101a mentioned in the PTX 8.7 spec (https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes), which is a slight variation of SM100a.

Bumping the max supported PTX version to 8.7, as the LLVM PR (llvm/llvm-project#124155) adding the support is now integrated to OpenXLA.
Copybara import of the project:

--
be59b7a51721637d880207e7adb69a18c3a92bea by Sergey Kozub <skozub@nvidia.com>:

[XLA:GPU] Add support for SM101a and SM120a architectures (Blackwell)

Merging this change closes #22029

PiperOrigin-RevId: 721088886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants