Skip to content

Conversation

@jdenny-ornl
Copy link
Collaborator

@jdenny-ornl jdenny-ornl commented Aug 8, 2024

Make it possible to do things like the following, regardless of whether the offload target is nvptx or amdgpu:

$ clang -O1 -g -fopenmp --offload-arch=native test.c                       \
    -Xoffload-linker -mllvm=-pass-remarks=inline                           \
    -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\
    -Xoffload-linker --lto-newpm-passes='forceattrs,default<O1>'           \
    -Xoffload-linker --lto-debug-pass-manager                              \
    -foffload-lto

To accomplish that:

  • In clang-linker-wrapper, do not forward options via -Wl if they might have literal commas. Use -Xlinker instead.
  • In clang-nvlink-wrapper, accept --lto-debug-pass-manager and --lto-newpm-passes.
  • In clang-nvlink-wrapper, drop -passes because it's inconsistent with the interface of lld, which is used instead of clang-nvlink-wrapper when the target is amdgpu. Without this patch, -passes is passed to nvlink, producing an error anyway.

Make it possible to do things like the following, regardless of
whether `$GPU_ARCH` is for nvptx or amdgpu:

```
$ clang -O1 -g -fopenmp -fopenmp-targets=$GPU_ARCH test.c                  \
    -Xoffload-linker -mllvm=-pass-remarks=inline                           \
    -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\
    -Xoffload-linker --lto-newpm-passes='forceattrs,default<O1>'           \
    -Xoffload-linker --lto-debug-pass-manager                              \
    -foffload-lto
```

To accomplish that:

- In clang-linker-wrapper, do not forward options via `-Wl` if they
  might have literal commas.  Use `-Xlinker` instead.
- In clang-nvlink-wrapper, accept `--lto-debug-pass-manager` and
  `--lto-newpm-passes`.
- In clang-nvlink-wrapper, drop `-passes` because it's inconsistent
  with the interface of `lld`, which is used instead of
  clang-nvlink-wrapper when the target is amdgpu.  Without this patch,
  `-passes` is passed to `nvlink`, producing an error anyway.
@jdenny-ornl jdenny-ornl requested a review from jhuber6 August 8, 2024 14:50
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Aug 8, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 8, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Joel E. Denny (jdenny-ornl)

Changes

Make it possible to do things like the following, regardless of whether $GPU_ARCH is for nvptx or amdgpu:

$ clang -O1 -g -fopenmp -fopenmp-targets=$GPU_ARCH test.c                  \
    -Xoffload-linker -mllvm=-pass-remarks=inline                           \
    -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\
    -Xoffload-linker --lto-newpm-passes='forceattrs,default&lt;O1&gt;'           \
    -Xoffload-linker --lto-debug-pass-manager                              \
    -foffload-lto

To accomplish that:

  • In clang-linker-wrapper, do not forward options via -Wl if they might have literal commas. Use -Xlinker instead.
  • In clang-nvlink-wrapper, accept --lto-debug-pass-manager and --lto-newpm-passes.
  • In clang-nvlink-wrapper, drop -passes because it's inconsistent with the interface of lld, which is used instead of clang-nvlink-wrapper when the target is amdgpu. Without this patch, -passes is passed to nvlink, producing an error anyway.

Full diff: https://github.com/llvm/llvm-project/pull/102483.diff

5 Files Affected:

  • (modified) clang/test/Driver/linker-wrapper.c (+12-6)
  • (modified) clang/test/Driver/nvlink-wrapper.c (+8)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+8-4)
  • (modified) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+2-13)
  • (modified) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+8)
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 99cfdb9ebfc7c..e70715d2a9bd7 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -238,9 +238,15 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
-// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --offload-opt=-pass-remarks=foo \
-// RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=OFFLOAD-OPT
-// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run -mllvm -pass-remarks=foo \
-// RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=OFFLOAD-OPT
-
-// OFFLOAD-OPT: clang{{.*}}-Wl,--plugin-opt=-pass-remarks=foo
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --offload-opt=-pass-remarks=foo,bar --linker-path=/usr/bin/ld \
+// RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=OFFLOAD-OPT
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   -mllvm -pass-remarks=foo,bar --linker-path=/usr/bin/ld \
+// RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=MLLVM
+
+//            MLLVM: clang{{.*}}-Xlinker --plugin-opt=-pass-remarks=foo,bar
+//      OFFLOAD-OPT: clang{{.*}}-Xlinker --plugin-opt=-pass-remarks=foo,bar
+//       MLLVM-SAME: -Xlinker -mllvm=-pass-remarks=foo,bar
+//  OFFLOAD-OPT-NOT: -Xlinker -mllvm=-pass-remarks=foo,bar
+// OFFLOAD-OPT-SAME: {{$}}
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
index 318315ddaca34..5d835d8d6cb2a 100644
--- a/clang/test/Driver/nvlink-wrapper.c
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -70,3 +70,11 @@ int baz() { return y + x; }
 // RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
 // RUN:   -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s --check-prefix=PATH
 // PATH-NOT: --cuda-path=/opt/cuda
+
+//
+// Check that passes can be specified and debugged.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   --lto-debug-pass-manager --lto-newpm-passes=forceattrs \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=PASSES
+// PASSES: Running pass: ForceFunctionAttrsPass
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 24cc4f0eeadf9..7030556ce01b1 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -527,9 +527,11 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
 
   // Forward all of the `--offload-opt` and similar options to the device.
   if (linkerSupportsLTO(Args)) {
-    for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
+    for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm)) {
+      CmdArgs.push_back(Args.MakeArgString("-Xlinker"));
       CmdArgs.push_back(
-          Args.MakeArgString("-Wl,--plugin-opt=" + StringRef(Arg->getValue())));
+          Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue())));
+    }
   }
 
   if (!Triple.isNVPTX())
@@ -571,9 +573,11 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   }
 
   // Pass on -mllvm options to the linker invocation.
-  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm))
+  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) {
+    CmdArgs.push_back(Args.MakeArgString("-Xlinker"));
     CmdArgs.push_back(
-        Args.MakeArgString("-Wl,-mllvm=" + StringRef(Arg->getValue())));
+        Args.MakeArgString("-mllvm=" + StringRef(Arg->getValue())));
+  }
 
   if (Args.hasArg(OPT_debug))
     CmdArgs.push_back("-g");
diff --git a/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
index 7851414ba7f4d..871fe5e4553cc 100644
--- a/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
+++ b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
@@ -84,18 +84,6 @@ static cl::list<std::string>
     PassPlugins("load-pass-plugin",
                 cl::desc("Load passes from plugin library"));
 
-static cl::opt<std::string> PassPipeline(
-    "passes",
-    cl::desc(
-        "A textual description of the pass pipeline. To have analysis passes "
-        "available before a certain pass, add 'require<foo-analysis>'. "
-        "'-passes' overrides the pass pipeline (but not all effects) from "
-        "specifying '--opt-level=O?' (O2 is the default) to "
-        "clang-linker-wrapper.  Be sure to include the corresponding "
-        "'default<O?>' in '-passes'."));
-static cl::alias PassPipeline2("p", cl::aliasopt(PassPipeline),
-                               cl::desc("Alias for -passes"));
-
 static void printVersion(raw_ostream &OS) {
   OS << clang::getClangToolFullVersion("clang-nvlink-wrapper") << '\n';
 }
@@ -365,8 +353,9 @@ Expected<std::unique_ptr<lto::LTO>> createLTO(const ArgList &Args) {
   Conf.OptLevel = Args.getLastArgValue(OPT_O, "2")[0] - '0';
   Conf.DefaultTriple = Triple.getTriple();
 
-  Conf.OptPipeline = PassPipeline;
+  Conf.OptPipeline = Args.getLastArgValue(OPT_lto_newpm_passes, "");
   Conf.PassPlugins = PassPlugins;
+  Conf.DebugPassManager = Args.hasArg(OPT_lto_debug_pass_manager);
 
   Conf.DiagHandler = diagnosticHandler;
   Conf.CGFileType = CodeGenFileType::AssemblyFile;
diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
index 01bd0f85b1a33..a97190b7a614c 100644
--- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
+++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
@@ -1,3 +1,6 @@
+// We try to create options similar to lld's.  That way, options passed to clang
+// -Xoffload-linker can be the same whether offloading to nvptx or amdgpu.
+
 include "llvm/Option/OptParser.td"
 
 def WrapperOnlyOption : OptionFlag;
@@ -70,6 +73,11 @@ def plugin_opt : Joined<["--", "-"], "plugin-opt=">, Flags<[WrapperOnlyOption]>,
   HelpText<"Options passed to LLVM, not including the Clang invocation. Use "
            "'--plugin-opt=--help' for a list of options.">;
 
+def lto_newpm_passes : Joined<["--"], "lto-newpm-passes=">,
+  Flags<[WrapperOnlyOption]>, HelpText<"Passes to run during LTO">;
+def lto_debug_pass_manager : Flag<["--"], "lto-debug-pass-manager">,
+  Flags<[WrapperOnlyOption]>,   HelpText<"Debug new pass manager">;
+
 def save_temps : Flag<["--", "-"], "save-temps">,
   Flags<[WrapperOnlyOption]>, HelpText<"Save intermediate results">;
 

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One nit. Also, you could use native instead of GPU_ARCH in the above ;).

@jdenny-ornl
Copy link
Collaborator Author

Also, you could use native instead of GPU_ARCH in the above ;).

Good point. I should have gotten in that habit before now.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks

Co-authored-by: Joseph Huber <huberjn@outlook.com>
@jdenny-ornl jdenny-ornl merged commit 3c639b8 into llvm:main Aug 9, 2024
@jdenny-ornl
Copy link
Collaborator Author

Thanks for the review.

kutemeikito added a commit to kutemeikito/llvm-project that referenced this pull request Aug 10, 2024
* 'main' of https://github.com/llvm/llvm-project: (700 commits)
  [SandboxIR][NFC] SingleLLVMInstructionImpl class (llvm#102687)
  [ThinLTO]Clean up 'import-assume-unique-local' flag. (llvm#102424)
  [nsan] Make #include more conventional
  [SandboxIR][NFC] Use Tracker.emplaceIfTracking()
  [libc]  Moved range_reduction_double ifdef statement (llvm#102659)
  [libc] Fix CFP long double and add tests (llvm#102660)
  [TargetLowering] Handle vector types in expandFixedPointMul (llvm#102635)
  [compiler-rt][NFC] Replace environment variable with %t (llvm#102197)
  [UnitTests] Convert a test to use opaque pointers (llvm#102668)
  [CodeGen][NFCI] Don't re-implement parts of ASTContext::getIntWidth (llvm#101765)
  [SandboxIR] Clean up tracking code with the help of emplaceIfTracking() (llvm#102406)
  [mlir][bazel] remove extra blanks in mlir-tblgen test
  [NVPTX][NFC] Update tests to use bfloat type (llvm#101493)
  [mlir] Add support for parsing nested PassPipelineOptions (llvm#101118)
  [mlir][bazel] add missing td dependency in mlir-tblgen test
  [flang][cuda] Fix lib dependency
  [libc] Clean up remaining use of *_WIDTH macros in printf (llvm#102679)
  [flang][cuda] Convert cuf.alloc for box to fir.alloca in device context (llvm#102662)
  [SandboxIR] Implement the InsertElementInst class (llvm#102404)
  [libc] Fix use of cpp::numeric_limits<...>::digits (llvm#102674)
  [mlir][ODS] Verify type constraints in Types and Attributes (llvm#102326)
  [LTO] enable `ObjCARCContractPass` only on optimized build  (llvm#101114)
  [mlir][ODS] Consistent `cppType` / `cppClassName` usage (llvm#102657)
  [lldb] Move definition of SBSaveCoreOptions dtor out of header (llvm#102539)
  [libc] Use cpp::numeric_limits in preference to C23 <limits.h> macros (llvm#102665)
  [clang] Implement -fptrauth-auth-traps. (llvm#102417)
  [LLVM][rtsan] rtsan transform to preserve CFGAnalyses (llvm#102651)
  Revert "[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086)"
  [RISCV][GISel] Add missing tests for G_CTLZ/CTTZ instruction selection. NFC
  Return available function types for BindingDecls. (llvm#102196)
  [clang] Wire -fptrauth-returns to "ptrauth-returns" fn attribute. (llvm#102416)
  [RISCV] Remove riscv-experimental-rv64-legal-i32. (llvm#102509)
  [RISCV] Move PseudoVSET(I)VLI expansion to use PseudoInstExpansion. (llvm#102496)
  [NVPTX] support switch statement with brx.idx (reland) (llvm#102550)
  [libc][newhdrgen]sorted function names in yaml (llvm#102544)
  [GlobalIsel] Combine G_ADD and G_SUB with constants (llvm#97771)
  Suppress spurious warnings due to R_RISCV_SET_ULEB128
  [scudo] Separated committed and decommitted entries. (llvm#101409)
  [MIPS] Fix missing ANDI optimization (llvm#97689)
  [Clang] Add env var for nvptx-arch/amdgpu-arch timeout (llvm#102521)
  [asan] Switch allocator to dynamic base address (llvm#98511)
  [AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086)
  [libc][math][c23] Add fadd{l,f128} C23 math functions (llvm#102531)
  [mlir][bazel] revert bazel rule change for DLTITransformOps
  [msan] Support vst{2,3,4}_lane instructions (llvm#101215)
  Revert "[MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561)"
  [X86] pr57673.ll - generate MIR test checks
  [mlir][vector][test] Split tests from vector-transfer-flatten.mlir (llvm#102584)
  [mlir][bazel] add bazel rule for DLTITransformOps
  OpenMPOpt: Remove dead include
  [IR] Add method to GlobalVariable to change type of initializer. (llvm#102553)
  [flang][cuda] Force default allocator in device code (llvm#102238)
  [llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (llvm#102578)
  [MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561)
  [AMDGPU][AsmParser][NFC] Remove a misleading comment. (llvm#102604)
  [Arm][AArch64][Clang] Respect function's branch protection attributes. (llvm#101978)
  [mlir] Verifier: steal bit to track seen instead of set. (llvm#102626)
  [Clang] Fix Handling of Init Capture with Parameter Packs in LambdaScopeForCallOperatorInstantiationRAII (llvm#100766)
  [X86] Convert truncsat clamping patterns to use SDPatternMatch. NFC.
  [gn] Give two scripts argparse.RawDescriptionHelpFormatter
  [bazel] Add missing dep for the SPIRVToLLVM target
  [Clang] Simplify specifying passes via -Xoffload-linker (llvm#102483)
  [bazel] Port for d45de80
  [SelectionDAG] Use unaligned store/load to move AVX registers onto stack for `insertelement` (llvm#82130)
  [Clang][OMPX] Add the code generation for multi-dim `num_teams` (llvm#101407)
  [ARM] Regenerate big-endian-vmov.ll. NFC
  [AMDGPU][AsmParser][NFCI] All NamedIntOperands to be of the i32 type. (llvm#102616)
  [libc][math][c23] Add totalorderl function. (llvm#102564)
  [mlir][spirv] Support `memref` in `convert-to-spirv` pass (llvm#102534)
  [MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (llvm#101664)
  Fix a unit test input file (llvm#102567)
  [llvm-readobj][COFF] Dump hybrid objects for ARM64X files. (llvm#102245)
  AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (llvm#102614)
  [MemoryBuiltins] Simplify getCalledFunction() helper (NFC)
  [AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (llvm#102105)
  [MemoryBuiltins] Handle allocator attributes on call-site
  LSV/test/AArch64: add missing lit.local.cfg; fix build (llvm#102607)
  Revert "Enable logf128 constant folding for hosts with 128bit floats (llvm#96287)"
  [RISCV] Add Syntacore SCR5 RV32/64 processors definition (llvm#102285)
  [InstCombine] Remove unnecessary RUN line from test (NFC)
  [flang][OpenMP] Handle multiple ranges in `num_teams` clause (llvm#102535)
  [mlir][vector] Add tests for scalable vectors in one-shot-bufferize.mlir (llvm#102361)
  [mlir][vector] Disable `vector.matrix_multiply` for scalable vectors (llvm#102573)
  [clang] Implement CWG2627 Bit-fields and narrowing conversions (llvm#78112)
  [NFC] Use references to avoid copying (llvm#99863)
  Revert "[mlir][ArmSME] Pattern to swap shape_cast(tranpose) with transpose(shape_cast) (llvm#100731)" (llvm#102457)
  [IRBuilder] Generate nuw GEPs for struct member accesses (llvm#99538)
  [bazel] Port for 9b06e25
  [CodeGen][NewPM] Improve start/stop pass error message CodeGenPassBuilder (llvm#102591)
  [AArch64] Implement TRBMPAM_EL1 system register (llvm#102485)
  [InstCombine] Fixing wrong select folding in vectors with undef elements (llvm#102244)
  [AArch64] Sink operands to fmuladd. (llvm#102297)
  LSV: document hang reported in llvm#37865 (llvm#102479)
  Enable logf128 constant folding for hosts with 128bit floats (llvm#96287)
  [RISCV][clang] Remove bfloat base type in non-zvfbfmin vcreate (llvm#102146)
  [RISCV][clang] Add missing `zvfbfmin` to `vget_v` intrinsic (llvm#102149)
  [mlir][vector] Add mask elimination transform (llvm#99314)
  [Clang][Interp] Fix display of syntactically-invalid note for member function calls (llvm#102170)
  [bazel] Port for 3fffa6d
  [DebugInfo][RemoveDIs] Use iterator-inserters in clang (llvm#102006)
  ...

Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants