[CUDA] Include PTX in non-RDC mode using the new driver #84367

jhuber6 · 2024-03-07T19:46:30Z

Summary:
The old driver embed PTX in rdc-mode and so does the nvcc compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.

llvmbot · 2024-03-07T19:46:59Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
The old driver embed PTX in rdc-mode and so does the nvcc compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.

Full diff: https://github.com/llvm/llvm-project/pull/84367.diff

2 Files Affected:

(modified) clang/lib/Driver/Driver.cpp (+8)
(modified) clang/test/Driver/cuda-phases.cu (+13-12)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index cecd34acbc92c0..96e6ad77f5e50d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
       DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
       OffloadAction::DeviceDependences DDep;
       DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+      // Compiling CUDA in non-RDC mode uses the PTX output if available.
+      for (Action *Input : A->getInputs())
+        if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+            !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+                          false))
+          DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
       OffloadActions.push_back(C.MakeAction<OffloadAction>(DDep, A->getType()));
+
       ++TCAndArch;
     }
   }
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index 9a231091de2bdc..a1c3c9b51b1e41 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -244,31 +244,32 @@
 // NEW-DRIVER-RDC-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-RDC-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
-// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases --offload-new-driver -fgpu-rdc \
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases --offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck --check-prefix=NEW-DRIVER %s
-//      NEW-DRIVER: 0: input, "[[INPUT:.+]]", cuda
-// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output
-// NEW-DRIVER-NEXT: 2: compiler, {1}, ir
-// NEW-DRIVER-NEXT: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
+//      NEW-DRIVER: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
+// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
+// NEW-DRIVER-NEXT: 2: compiler, {1}, ir, (host-cuda)
+// NEW-DRIVER-NEXT: 3: input, "[[CUDA]]", cuda, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 6: backend, {5}, assembler, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 7: assembler, {6}, object, (device-cuda, sm_52)
-// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, object
-// NEW-DRIVER-NEXT: 9: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
+// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {6}, object
+// NEW-DRIVER-NEXT: 9: input, "[[CUDA]]", cuda, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 10: preprocessor, {9}, cuda-cpp-output, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 11: compiler, {10}, ir, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 12: backend, {11}, assembler, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 13: assembler, {12}, object, (device-cuda, sm_70)
-// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {13}, object
-// NEW-DRIVER-NEXT: 15: clang-offload-packager, {8, 14}, image
-// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, "device-cuda (powerpc64le-ibm-linux-gnu)" {15}, ir
+// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {13}, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {12}, object
+// NEW-DRIVER-NEXT: 15: linker, {8, 14}, cuda-fatbin, (device-cuda)
+// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {15}, ir
 // NEW-DRIVER-NEXT: 17: backend, {16}, assembler, (host-cuda)
 // NEW-DRIVER-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
 // RUN: %clang -### --target=powerpc64le-ibm-linux-gnu -ccc-print-phases --offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s %S/Inputs/empty.cpp 2>&1 | FileCheck --check-prefix=NON-CUDA-INPUT %s
+
 //      NON-CUDA-INPUT: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
 // NON-CUDA-INPUT-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
 // NON-CUDA-INPUT-NEXT: 2: compiler, {1}, ir, (host-cuda)
@@ -277,13 +278,13 @@
 // NON-CUDA-INPUT-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NON-CUDA-INPUT-NEXT: 6: backend, {5}, assembler, (device-cuda, sm_52)
 // NON-CUDA-INPUT-NEXT: 7: assembler, {6}, object, (device-cuda, sm_52)
-// NON-CUDA-INPUT-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, object
+// NON-CUDA-INPUT-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {6}, object
 // NON-CUDA-INPUT-NEXT: 9: input, "[[CUDA]]", cuda, (device-cuda, sm_70)
 // NON-CUDA-INPUT-NEXT: 10: preprocessor, {9}, cuda-cpp-output, (device-cuda, sm_70)
 // NON-CUDA-INPUT-NEXT: 11: compiler, {10}, ir, (device-cuda, sm_70)
 // NON-CUDA-INPUT-NEXT: 12: backend, {11}, assembler, (device-cuda, sm_70)
 // NON-CUDA-INPUT-NEXT: 13: assembler, {12}, object, (device-cuda, sm_70)
-// NON-CUDA-INPUT-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {13}, object
+// NON-CUDA-INPUT-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {13}, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {12}, object
 // NON-CUDA-INPUT-NEXT: 15: linker, {8, 14}, cuda-fatbin, (device-cuda)
 // NON-CUDA-INPUT-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {15}, ir
 // NON-CUDA-INPUT-NEXT: 17: backend, {16}, assembler, (host-cuda)

Artem-B · 2024-03-07T19:56:25Z

clang/lib/Driver/Driver.cpp

@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
      DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
      OffloadAction::DeviceDependences DDep;
      DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+      // Compiling CUDA in non-RDC mode uses the PTX output if available.


Do we still respect --cuda-include-ptx=... ?

So, the current behavior is that it will "always" set the PTX in the job and optionally include it in the fatbinary job depending on those settings.

Artem-B · 2024-03-07T20:01:42Z

clang/lib/Driver/Driver.cpp

+      // Compiling CUDA in non-RDC mode uses the PTX output if available.
+      for (Action *Input : A->getInputs())
+        if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+            !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,


I'm not quite sure why we would need to include PTX for RDC compilation.

In retrospect, including PTX by default with all compilations turned out to be a wrong default choice.
It's just a waste of space for most of the users, and it allows problems to go unnoticed for longer than they should (e.g. something was compiled for a wrong GPU).

Switching to the new driver is a good point to make a better choice. I would argue that we should not be including PTX by default or, if we do deem that it may be useful, only add it for the most recent chosen GPU variant, to provide some forward compatibility, not for all of them.

Yeah, I don't have my finger on the pulse of the CUDA users here. I think we want this patch to match the current behavior with --cuda-include-ptx as it seems to make the decision whether or not to include it at job creation time. We could then potentially change the default of --cuda-include-ptx if that's the preferred solution.

jhuber6 · 2024-03-07T20:54:12Z

Should I make shouldIncludePTX default to false for the new driver?

Artem-B · 2024-03-07T21:35:42Z

Should I make shouldIncludePTX default to false for the new driver?

Yes, I think that's a better default.

jhuber6 · 2024-03-07T21:51:31Z

Should I make shouldIncludePTX default to false for the new driver?

Yes, I think that's a better default.

Done, now requires --cuda-include-ptx=.

Artem-B · 2024-03-07T22:39:25Z

Should I make shouldIncludePTX default to false for the new driver?

Yes, I think that's a better default.

Done, now requires --cuda-include-ptx=.

This may be worth adding to the release notes.

Artem-B · 2024-03-07T22:41:10Z

clang/lib/Driver/ToolChains/Cuda.cpp

-      continue;
+static bool shouldIncludePTX(const ArgList &Args, StringRef InputArch) {
+  // The new driver does not include PTX by default.
+  bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver,


I'd add a comment on why we're making this decision based on the new vs old driver.

Artem-B

LGTM overall, with docs/comment nits.

Summary: The old driver embed PTX in rdc-mode and so does the `nvcc` compiler. The new drivers currently does not do this, so we should keep it consistent in this case. This simply requires adding the assembler output as an input to the offloading action that gets fed to fatbin.

jhuber6 · 2024-03-07T22:46:17Z

LGTM overall, with docs/comment nits.

Done, thanks for the review.

jhuber6 requested review from Artem-B, jdoerfert, jlebar, shiltian and yxsamliu March 7, 2024 19:46

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Mar 7, 2024

Artem-B reviewed Mar 7, 2024

View reviewed changes

jhuber6 force-pushed the IncludePTX branch from fdd40f7 to 6d1b325 Compare March 7, 2024 21:51

jhuber6 force-pushed the IncludePTX branch from 6d1b325 to afac731 Compare March 7, 2024 22:18

Artem-B reviewed Mar 7, 2024

View reviewed changes

Artem-B approved these changes Mar 7, 2024

View reviewed changes

jhuber6 force-pushed the IncludePTX branch from afac731 to 47d3605 Compare March 7, 2024 22:43

jhuber6 merged commit 3a56b5a into llvm:main Mar 7, 2024

mkuron mentioned this pull request Mar 20, 2025

[clang][CUDA] No --no-cuda-include-sass option available to include only PTX code in fatbin #132204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Include PTX in non-RDC mode using the new driver #84367

[CUDA] Include PTX in non-RDC mode using the new driver #84367

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

llvmbot commented Mar 7, 2024 •

edited

Loading

Uh oh!

Artem-B Mar 7, 2024

Uh oh!

jhuber6 Mar 7, 2024

Uh oh!

Artem-B Mar 7, 2024

Uh oh!

jhuber6 Mar 7, 2024

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

Artem-B commented Mar 7, 2024

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

Artem-B commented Mar 7, 2024

Uh oh!

Artem-B Mar 7, 2024

Uh oh!

Artem-B left a comment

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

Uh oh!

[CUDA] Include PTX in non-RDC mode using the new driver #84367

[CUDA] Include PTX in non-RDC mode using the new driver #84367

Uh oh!

Conversation

jhuber6 commented Mar 7, 2024

Uh oh!

llvmbot commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

Artem-B commented Mar 7, 2024

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

Artem-B commented Mar 7, 2024

Uh oh!

Artem-B Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Mar 7, 2024

Uh oh!

Uh oh!

llvmbot commented Mar 7, 2024 •

edited

Loading