[SYCL] Support per-object file compilation #7595

sarnex · 2022-11-30T20:26:06Z

This change adds per-object compilation support for SYCL, also called non-relocatable device code mode. This is already supported in clang for HIP and CUDA.

It adds a new option -f[no-]sycl-rdc. The default is -fsycl-rdc, which compiles code as today. Passing -fno-sycl-rdc activates the new mode. This is just an alias to the existing flag used by AMD/CUDA, f[no-]-gpu-rdc.

The main implication is that we no longer link all device code together into one big module before post link.
Instead, we execute all jobs after device linking on a per-object file basis.
This means sycl-post-link and the later jobs execute multiple times, since we no longer have one big module.

This can result in large improvement performance in the compiler runtime and memory usage, we see a max memory usage reduction for QUDA with -g from over 250GB to 4GB and a large compiler runtime improvement as well.

Error cases:

Cross-object dependencies. Since we don't link device code together, each object file must be independent. I added an error in Sema to error if the user passes this flag and has cross-object dependencies.
Invalid architecture in fat object. We currently warn gracefully about this, in per-object-file mode llvm-foreach throws an error customers won't understand, so error out in that case instead of warning.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Today, the basic flow of the SYCL device-side driver is as follows 1) Link all device code from all compiler inputs together 2) Link 1) against SYCL device libraries 3) Run sycl-post-link on 2) 4) Run llvm-spirv on 3) 5) Run clang-offload-wrapper on 4) Step 1 can create a performance bottleneck when you have a huge number of kernels. If none of the kernels use globals or SYCL_EXTERNAL functions, we can actually split it up to be the following: For object file: 1) Link this object file's device code with SYCL device libraries 2) Run sycl-post-link on 1) 3) Run llvm-spirv on 2) 4) Run clang-offload-wrapper on 4) Since we don't link all device code together, each step runs on smaller IR which results in compiler runtime and compiler memory usage benefits. Note that in order to do the above per-object-file, we need to break up fat static arhives. We do this by using the ForEachWrappingAction action. This allows us to run commands on each item inside the fat static archive The driver flow when a static archive is involved is the most complex case and looks like the below: 1) spriv-to-ir-wrapper on fat static archive 2) Link all SYCL device libraries together without any user device code into a single device library BC 3) llvm-foreach: 3a): llvm-link current object file with 2) 4) file-table-tform replacing tempfilelist from 1) with output of 3) 5) llvm-foreach: 5a) sycl-post-link on 4) 6) llvm-foreach: 6a) Extract BC file column from 5) output table 7) file-table-tform on 6) merging all BC file columns into a single big column 8) llvm-spirv on 7) (does llvm-foreach internally) 9) file-table-tform on 5) output table, replacing BC column with spirv column from 8) 10) clang-offload-wrapper with 9) Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

I don't really like this commit, but I see the following requirements 1) Keep the default for GPURelocatableDeviceCode to false 2) For sycl, if no cc1 option is specified, GPURelocatableDeviceCode should be true Let me know if anyone has any better ideas Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

llvm/test/tools/file-table-tform/Inputs/a1.txt

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/lib/Driver/ToolChains/SYCL.cpp

clang/include/clang/Basic/DiagnosticSemaKinds.td

clang/test/SemaSYCL/sycl-no-rdc.cpp

premanandrao

FE changes LGTM.

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/lib/Driver/Driver.cpp

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

llvm/lib/Support/SimpleTable.cpp

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

sarnex · 2022-12-08T15:57:23Z

@AlexeySachkov @mdtoguchi Any more feedback on this bad boy? I think I addressed all feedback. Thanks!

AlexeySachkov

A couple more questions here and minor comments.

clang/lib/Driver/ToolChains/Clang.cpp

clang/include/clang/Basic/DiagnosticSemaKinds.td

llvm/test/tools/file-table-tform/file-table-tform-merge.test

llvm/tools/file-table-tform/file-table-tform.cpp

llvm/tools/sycl-post-link/sycl-post-link.cpp

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

sarnex · 2022-12-09T16:44:33Z

@premanandrao Do you mind re-checking the FE changes? I had to rework them

AlexeySachkov

I have no further concerns/comments, thanks

sarnex · 2022-12-09T21:38:18Z

Thanks, with the CFE re-review complete we are now ready to merge!

premanandrao · 2022-12-12T14:55:19Z

@premanandrao Do you mind re-checking the FE changes? I had to rework them

Thanks, I have no further concerns.

sarnex · 2022-12-12T16:00:04Z

@intel/llvm-gatekeepers Mind merging this one? Thanks!

sarnex added 7 commits November 30, 2022 14:50

Add merge option to file-table-form

5ff8b8b

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Add serial action capability to ForEachWrappingAction

26d710c

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Add -fsycl-rdc and -fno-sycl-rdc

041ccd0

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Honor -fsycl-max-parallel-link-jobs for ForEachWrapperAction

7f045b0

Add function to detect if we should do per object linking

50c19cc

Add error with -fsycl-no-rdc and missing offload arch

0cef24c

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Add error for unsupported device code split mode

9fc955c

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

sarnex changed the title ~~[SYCL] Support non-relocatable device code~~ [SYCL] Support per-object file compilation Nov 30, 2022

sarnex added 7 commits December 1, 2022 11:42

comment

9cc4687

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Make llvm-link command work with -fno-sycl-rdc

4bba623

lit tests

a5466a4

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Add error when calling unsupported function

a7e0464

clang format

efa7ff4

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

sarnex marked this pull request as ready for review December 1, 2022 21:41

sarnex requested review from a team as code owners December 1, 2022 21:41

sarnex requested review from mdtoguchi and bader December 1, 2022 21:41

mdtoguchi reviewed Dec 2, 2022

View reviewed changes

premanandrao reviewed Dec 2, 2022

View reviewed changes

address feedback

0bd8080

sarnex requested a review from premanandrao December 2, 2022 20:31

premanandrao approved these changes Dec 2, 2022

View reviewed changes

AlexeySachkov reviewed Dec 2, 2022

View reviewed changes

clang/include/clang/Basic/DiagnosticDriverKinds.td Outdated Show resolved Hide resolved

mdtoguchi reviewed Dec 2, 2022

View reviewed changes

clang/lib/Driver/Driver.cpp Outdated Show resolved Hide resolved

sarnex added 2 commits December 5, 2022 13:53

remove useless if

30e7c5d

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

allow all device code split modes

53dc433

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

remove incorrect assert

bec3954

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

bader reviewed Dec 6, 2022

View reviewed changes

llvm/lib/Support/SimpleTable.cpp Outdated Show resolved Hide resolved

sarnex added 3 commits December 6, 2022 09:57

error for SYCL_EXTERNAL declaration

cfceeef

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

simplify loop

8468af2

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

formatting

a803d1e

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

sarnex requested a review from AlexeySachkov December 6, 2022 16:50

mdtoguchi approved these changes Dec 8, 2022

View reviewed changes

AlexeySachkov reviewed Dec 9, 2022

View reviewed changes

sarnex added 2 commits December 9, 2022 11:13

improve error

e893bc6

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

address comments

691adcd

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

sarnex requested a review from premanandrao December 9, 2022 16:44

sarnex requested a review from AlexeySachkov December 9, 2022 16:49

AlexeySachkov approved these changes Dec 9, 2022

View reviewed changes

smanna12 approved these changes Dec 9, 2022

View reviewed changes

sarnex removed the request for review from premanandrao December 9, 2022 21:37

AlexeySachkov merged commit f884993 into intel:sycl Dec 12, 2022

tdavidcl mentioned this pull request Feb 5, 2023

[Patch] Variant storage of the fields Shamrock-code/Shamrock#59

Merged

al42and mentioned this pull request Feb 15, 2023

Very slow linking with -fno-sycl-rdc #8353

Open

[SYCL] Support per-object file compilation #7595

[SYCL] Support per-object file compilation #7595

Uh oh!

Conversation

sarnex commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

premanandrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarnex commented Dec 8, 2022

Uh oh!

AlexeySachkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarnex commented Dec 9, 2022

Uh oh!

AlexeySachkov left a comment

Choose a reason for hiding this comment

Uh oh!

sarnex commented Dec 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

premanandrao commented Dec 12, 2022

Uh oh!

sarnex commented Dec 12, 2022

Uh oh!

Uh oh!

sarnex commented Nov 30, 2022 •

edited

Loading

sarnex commented Dec 9, 2022 •

edited

Loading