[SYCL] Always inline kernel lambda operator in entry point #6977

npmiller · 2022-10-06T09:31:46Z

This patch marks the operator() of the kernel lambda as always_inline so that it gets inlined into the kernel entry point.

Kernel entry point are functions that take the captured variables as parameters, create a lambda object from that, setup the index structs and then call operator() on the lambda. Inlining the operator into the entry point should be beneficial in most cases as it allows the compiler to optimize out the lambda creation, which can be very important for kernels capturing a lot of variables.

In a lot of cases the inliner will already do it, but when it doesn't it can lead to very confusing performance implications since the kernel entry point isn't directly visible to users.

Because the always inliner runs very early this patch broke a number of lit tests that were checking for the operator function, I believe I've managed to fix most of them while maintaining the spirit of the test, but some reviews and/or suggestions on these would be appreciated.

Fznamznon

The patch overall looks ok, but I'm not an expert in optimizations matter. We actually have a bunch of early optimizations enabled for device code in clang, they should be able to deal with the inline, don't they?

premanandrao · 2022-10-06T15:56:59Z

I too am okay with the patch. If we want to examine the kernel body function in the future for some tests, is there anything you would suggest as a command-line option to get (close to) the previous behavior?

npmiller · 2022-10-06T17:57:50Z

We actually have a bunch of early optimizations enabled for device code in clang, they should be able to deal with the inline, don't they?

Yes, and they do in a lot of cases but not always, it just comes down to the regular inlining heuristics.

The specific issue I ran into was a kernel that was capturing a lot of variables and just calling a device function with all these variables as parameters, which is a very common pattern in SYCL code.

And so because of the number of parameters the creation of the lambda object was pretty expensive, and the function call itself was also pretty expensive, and in this scenario inlining everything gave better performance.

However the device function being pretty large the inliner decided not to inline it (I'm also currently looking into tweaking the inlining heuristics for that), but as a workaround I simply added the always_inline attribute on the device function to force its inlining. But then what happened is that the device function got inlined into the operator() which meant that the operator was now as large as the device function, and so the inliner decided to not inline it. And so with or without the always_inline attribute on my device function I would get the exact same performance because it would never get inlined all the way into the kernel entry point.

If we want to examine the kernel body function in the future for some tests, is there anything you would suggest as a command-line option to get (close to) the previous behavior?

As far as I know there's no way to disable the always inliner, but that's a really good point, I could add a flag to disable this, and that way I could also just add it to existing lit tests that need it rather than having to change them.

premanandrao · 2022-10-07T01:53:43Z

As far as I know there's no way to disable the always inliner, but that's a really good point, I could add a flag to disable this, and that way I could also just add it to existing lit tests that need it rather than having to change them.

Thanks, would appreciate that.

npmiller · 2022-10-11T16:58:03Z

I've updated the patch as follows:

Add -f[no]-sycl-force-inline-kernel-lambda option enabled by default
Revert modifications to the lit tests and just fix them by using the [no] variant of the option
Added test for the option

In addition I've also investigated the ESIMD failures from the CI and it seems inlining that early causes issues with the ESIMD attributes. Currently the attribute is propagated to the kernel in the IR passes, I've attempted to propagate it in SemaSYCL so that it's handled before the inlining but that causes issues with the ESIMD validator. In addition it seems that the ESIMD IR passes already force inline the entire kernel call tree so I've decided to simply not take into account the new flag for ESIMD which should fix all the issues from the CI.

steffenlarsen · 2022-10-11T17:03:49Z

sycl/doc/UsersManual.md

@@ -107,6 +107,12 @@ and not recommended to use in production environment.
    * nd_item class get_global_id()/get_global_linear_id() member functions
    Enabled by default.

+**`-f[no]sycl-force-inline-kernel-lambda`**


Suggested change

**`-f[no]sycl-force-inline-kernel-lambda`**

**`-f[no-]sycl-force-inline-kernel-lambda`**

clang/include/clang/Driver/Options.td

This patch marks the `operator()` of the kernel lambda as `always_inline` so that it gets inlined into the kernel entry point. Kernel entry point are functions that take the captured variables as parameters, create a lambda object from that, setup the index structs and then call `operator()` on the lambda. Inlining the operator into the entry point should be beneficial in most cases as it allows the compiler to optimize out the lambda creation, which can be very important for kernels capturing a lot of variables. In a lot of cases the inliner will already do it, but when it doesn't it can lead to very confusing performance implications since the kernel entry point isn't directly visible to users.

sycl/doc/UsersManual.md

Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>

steffenlarsen

SYCL docs LGTM!

clang/lib/Driver/ToolChains/Clang.cpp

pvchupin · 2022-10-12T20:06:43Z

@npmiller, please look into post commit issue on windows: https://github.com/intel/llvm/actions/runs/3236643294/jobs/5302734155

Failed Tests (1):
  Clang :: SemaSYCL/sycl-force-inline-kernel-lambda.cpp

Without the target flag it was mangling the names differently on windows, and so breaking the check, simply always generate IR for SPIR target instead. This patch fixes the post-commit issue on Windows reported after: * #6977

premanandrao · 2022-11-17T18:34:19Z

@npmiller, we have internal reports that this change perceivably affects debugging at -O0 levels. What do you think of disabling this inlining at -O0?

npmiller · 2022-11-18T09:54:15Z

@npmiller, we have internal reports that this change perceivably affects debugging at -O0 levels. What do you think of disabling this inlining at -O0?

@premanandrao That seems reasonable, disabling it at -O0 shouldn't cause any issues

PR #6977 enabled always inlining kernel lambda operators. This PR disables this at -O0 as it was leading to a poor debugging experience.

npmiller requested a review from a team as a code owner October 6, 2022 09:31

Fznamznon reviewed Oct 6, 2022

View reviewed changes

npmiller requested review from a team as code owners October 11, 2022 16:53

steffenlarsen reviewed Oct 11, 2022

View reviewed changes

mdtoguchi reviewed Oct 11, 2022

View reviewed changes

clang/include/clang/Driver/Options.td Show resolved Hide resolved

npmiller added 4 commits October 12, 2022 09:58

[SYCL] Introduce flag to disable force inlining of kernel lambda

416405b

[SYCL] Accept kernel lambda as clang argument

e987074

[SYCL] Fix new failing tests with lambda inlining

bc64500

npmiller force-pushed the inline-operator branch from 72e4ae9 to bc64500 Compare October 12, 2022 09:34

steffenlarsen reviewed Oct 12, 2022

View reviewed changes

sycl/doc/UsersManual.md Outdated Show resolved Hide resolved

Update sycl/doc/UsersManual.md

0d3ff38

Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>

steffenlarsen approved these changes Oct 12, 2022

View reviewed changes

bader requested review from mdtoguchi and Fznamznon October 12, 2022 09:55

Fznamznon approved these changes Oct 12, 2022

View reviewed changes

smanna12 approved these changes Oct 12, 2022

View reviewed changes

mdtoguchi reviewed Oct 12, 2022

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Show resolved Hide resolved

[SYCL] Add driver test for -fno-sycl-force-inline-kernel-lambda

b611501

mdtoguchi approved these changes Oct 12, 2022

View reviewed changes

premanandrao approved these changes Oct 12, 2022

View reviewed changes

pvchupin merged commit b91b732 into intel:sycl Oct 12, 2022

npmiller mentioned this pull request Oct 13, 2022

[SYCL] Fix force inline kernel lambda test #7046

Merged

premanandrao mentioned this pull request Nov 29, 2022

[SYCL] Disable inlining kernel lambda operator at -O0 #7578

Merged

pvchupin pushed a commit that referenced this pull request Dec 2, 2022

[SYCL] Disable inlining kernel lambda operator at -O0 (#7578)

2359d94

PR #6977 enabled always inlining kernel lambda operators. This PR disables this at -O0 as it was leading to a poor debugging experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Always inline kernel lambda operator in entry point #6977

[SYCL] Always inline kernel lambda operator in entry point #6977

Uh oh!

npmiller commented Oct 6, 2022

Uh oh!

Fznamznon left a comment

Uh oh!

premanandrao commented Oct 6, 2022

Uh oh!

npmiller commented Oct 6, 2022

Uh oh!

premanandrao commented Oct 7, 2022

Uh oh!

npmiller commented Oct 11, 2022

Uh oh!

steffenlarsen Oct 11, 2022

Uh oh!

Uh oh!

Uh oh!

steffenlarsen left a comment

Uh oh!

Uh oh!

pvchupin commented Oct 12, 2022

Uh oh!

premanandrao commented Nov 17, 2022

Uh oh!

npmiller commented Nov 18, 2022

Uh oh!

Uh oh!

	`-f[no]sycl-force-inline-kernel-lambda`
	`-f[no-]sycl-force-inline-kernel-lambda`

[SYCL] Always inline kernel lambda operator in entry point #6977

[SYCL] Always inline kernel lambda operator in entry point #6977

Uh oh!

Conversation

npmiller commented Oct 6, 2022

Uh oh!

Fznamznon left a comment

Choose a reason for hiding this comment

Uh oh!

premanandrao commented Oct 6, 2022

Uh oh!

npmiller commented Oct 6, 2022

Uh oh!

premanandrao commented Oct 7, 2022

Uh oh!

npmiller commented Oct 11, 2022

Uh oh!

steffenlarsen Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pvchupin commented Oct 12, 2022

Uh oh!

premanandrao commented Nov 17, 2022

Uh oh!

npmiller commented Nov 18, 2022

Uh oh!

Uh oh!