intel · bader · Jul 30, 2021 · Jul 29, 2021 · Jul 30, 2021
@@ -738,7 +738,7 @@ class SYCLPostLinkJobAction : public JobAction {
   void anchor() override;
 
 public:
-  // The tempfiletable management relies on a shadowing the main file type by
+  // The tempfiletable management relies on shadowing the main file type by
   // types::TY_Tempfiletable. The problem of shadowing is it prevents its
   // integration with clang tools that relies on the file type to properly set
   // args.

@@ -548,13 +548,15 @@ down to the NVPTX Back End. All produced bitcode depends on two libraries,
 
 During the "PTX target processing" in the device linking step [Device
 code post-link step](#device-code-post-link-step), the llvm bitcode
-objects for the CUDA target are linked together alongside
-`libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc`, compiled to PTX
-using the NVPTX backend and assembled into a cubin using the `ptxas`
-tool (part of the CUDA SDK). The PTX file and cubin are assembled
-together using `fatbinary` to produce a CUDA fatbin. The CUDA fatbin
-then replaces the llvm bitcode file in the file table generated by
-`sycl-post-link`. The resulting table is passed to the offload wrapper tool.
+objects for the CUDA target are linked together during the common
+`llvm-link` step and then split using the `sycl-post-link` tool.
+For each temporary bitcode file, clang is invoked for the temporary file to link
+`libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc` and compile the resulting
+module to PTX using the NVPTX backend. The resulting PTX file is assembled
+into a cubin using the `ptxas` tool (part of the CUDA SDK). The PTX file and
+cubin are assembled together using `fatbinary` to produce a CUDA fatbin.
+The produced CUDA fatbins then replace the llvm bitcode files in the file table generated
+by `sycl-post-link`. The resulting table is passed to the offload wrapper tool.
 
 ![NVPTX AOT build](images/DevicePTXProcessing.svg)