Skip to content

[Doc] Update SYCL CUDA documentation #4214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion clang/include/clang/Driver/Action.h
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,7 @@ class SYCLPostLinkJobAction : public JobAction {
void anchor() override;

public:
// The tempfiletable management relies on a shadowing the main file type by
// The tempfiletable management relies on shadowing the main file type by
// types::TY_Tempfiletable. The problem of shadowing is it prevents its
// integration with clang tools that relies on the file type to properly set
// args.
Expand Down
16 changes: 9 additions & 7 deletions sycl/doc/CompilerAndRuntimeDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,13 +548,15 @@ down to the NVPTX Back End. All produced bitcode depends on two libraries,

During the "PTX target processing" in the device linking step [Device
code post-link step](#device-code-post-link-step), the llvm bitcode
objects for the CUDA target are linked together alongside
`libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc`, compiled to PTX
using the NVPTX backend and assembled into a cubin using the `ptxas`
tool (part of the CUDA SDK). The PTX file and cubin are assembled
together using `fatbinary` to produce a CUDA fatbin. The CUDA fatbin
then replaces the llvm bitcode file in the file table generated by
`sycl-post-link`. The resulting table is passed to the offload wrapper tool.
objects for the CUDA target are linked together during the common
`llvm-link` step and then split using the `sycl-post-link` tool.
For each temporary bitcode file, clang is invoked for the temporary file to link
`libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc` and compile the resulting
module to PTX using the NVPTX backend. The resulting PTX file is assembled
into a cubin using the `ptxas` tool (part of the CUDA SDK). The PTX file and
cubin are assembled together using `fatbinary` to produce a CUDA fatbin.
The produced CUDA fatbins then replace the llvm bitcode files in the file table generated
by `sycl-post-link`. The resulting table is passed to the offload wrapper tool.

![NVPTX AOT build](images/DevicePTXProcessing.svg)

Expand Down
Loading