-
Notifications
You must be signed in to change notification settings - Fork 772
[SYCL] Always inline kernel lambda operator in entry point #6977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch overall looks ok, but I'm not an expert in optimizations matter. We actually have a bunch of early optimizations enabled for device code in clang, they should be able to deal with the inline, don't they?
I too am okay with the patch. If we want to examine the kernel body function in the future for some tests, is there anything you would suggest as a command-line option to get (close to) the previous behavior? |
Yes, and they do in a lot of cases but not always, it just comes down to the regular inlining heuristics. The specific issue I ran into was a kernel that was capturing a lot of variables and just calling a device function with all these variables as parameters, which is a very common pattern in SYCL code. And so because of the number of parameters the creation of the lambda object was pretty expensive, and the function call itself was also pretty expensive, and in this scenario inlining everything gave better performance. However the device function being pretty large the inliner decided not to inline it (I'm also currently looking into tweaking the inlining heuristics for that), but as a workaround I simply added the
As far as I know there's no way to disable the always inliner, but that's a really good point, I could add a flag to disable this, and that way I could also just add it to existing lit tests that need it rather than having to change them. |
Thanks, would appreciate that. |
I've updated the patch as follows:
In addition I've also investigated the ESIMD failures from the CI and it seems inlining that early causes issues with the ESIMD attributes. Currently the attribute is propagated to the kernel in the IR passes, I've attempted to propagate it in |
sycl/doc/UsersManual.md
Outdated
@@ -107,6 +107,12 @@ and not recommended to use in production environment. | |||
* nd_item class get_global_id()/get_global_linear_id() member functions | |||
Enabled by default. | |||
|
|||
**`-f[no]sycl-force-inline-kernel-lambda`** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**`-f[no]sycl-force-inline-kernel-lambda`** | |
**`-f[no-]sycl-force-inline-kernel-lambda`** |
This patch marks the `operator()` of the kernel lambda as `always_inline` so that it gets inlined into the kernel entry point. Kernel entry point are functions that take the captured variables as parameters, create a lambda object from that, setup the index structs and then call `operator()` on the lambda. Inlining the operator into the entry point should be beneficial in most cases as it allows the compiler to optimize out the lambda creation, which can be very important for kernels capturing a lot of variables. In a lot of cases the inliner will already do it, but when it doesn't it can lead to very confusing performance implications since the kernel entry point isn't directly visible to users.
72e4ae9
to
bc64500
Compare
Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL docs LGTM!
@npmiller, please look into post commit issue on windows: https://github.com/intel/llvm/actions/runs/3236643294/jobs/5302734155
|
Without the target flag it was mangling the names differently on windows, and so breaking the check, simply always generate IR for SPIR target instead. This patch fixes the post-commit issue on Windows reported after: * #6977
@npmiller, we have internal reports that this change perceivably affects debugging at -O0 levels. What do you think of disabling this inlining at -O0? |
@premanandrao That seems reasonable, disabling it at -O0 shouldn't cause any issues |
PR #6977 enabled always inlining kernel lambda operators. This PR disables this at -O0 as it was leading to a poor debugging experience.
This patch marks the
operator()
of the kernel lambda asalways_inline
so that it gets inlined into the kernel entry point.Kernel entry point are functions that take the captured variables as parameters, create a lambda object from that, setup the index structs and then call
operator()
on the lambda. Inlining the operator into the entry point should be beneficial in most cases as it allows the compiler to optimize out the lambda creation, which can be very important for kernels capturing a lot of variables.In a lot of cases the inliner will already do it, but when it doesn't it can lead to very confusing performance implications since the kernel entry point isn't directly visible to users.
Because the always inliner runs very early this patch broke a number of lit tests that were checking for the operator function, I believe I've managed to fix most of them while maintaining the spirit of the test, but some reviews and/or suggestions on these would be appreciated.