Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Experimental] Reduce the set of optimizations for SYCL device #1550

Merged
merged 6 commits into from
Apr 21, 2020

Conversation

bader
Copy link
Contributor

@bader bader commented Apr 18, 2020

This is patch limits the set of optimizations aiming to reduce the size of
generated device module.

Optimizations are currently disabled by default as they cause multiple
sorts of issues. Some of the issues are addressed within this patch, but
not all of them.

Optimizations can be enabled with -fsycl-enable-optimizaions front-end
option (or -Xclang -fsycl-enable-optimizaions driver option).

Signed-off-by: Alexey Bader alexey.bader@intel.com

bader added 2 commits April 18, 2020 09:33
This is patch limits the set of optimizations aiming to reduce the size of
generated device module.

Optimizations are currently disabled by default as they cause multiple
sorts of issues. Some of the issues are addressed within this patch, but
not all of them.

Optimizations can be enabled with `-fsycl-enable-optimizaions` front-end
option (or `-Xclang -fsycl-enable-optimizaions` driver option).

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
bader added 2 commits April 18, 2020 14:19
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs driver work to make sure it rejects the opt settings for SYCL? Or what do we expect -O3 to do here?

clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved
@bader
Copy link
Contributor Author

bader commented Apr 18, 2020

I think this needs driver work to make sure it rejects the opt settings for SYCL? Or what do we expect -O3 to do here?

At this moment it's not clear to me whether we should have a single optimization pipeline for SYCL device compiler or allow users to configure it with -O* options. As of now, this is not exposed to the users, so I can configure the pipeline with O1//2/3 with -Xclang -fsycl-enable-optimizations. But I see a lot issues with using O2 and O3 as is. These modes run a lot of "potentially harmful" aggressive loop optimizations. Also I did a comparison of a few configurations and found that if we target just to reduce the size of the module O2/3 inliner + O1 pipeline gives very similar results in most cases as O2/O3, but O2/O3 loop optimizations introduce a lot of constructs not supported in the SPIR-V translator, so I decided to commit this configuration to provide this option to the users and gather the feedback. Meanwhile I'm going to continue working on fixing issues with unmodified pipelines and configuring it through driver options.

erichkeane
erichkeane previously approved these changes Apr 18, 2020
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
clang/lib/Basic/Targets/SPIR.h Show resolved Hide resolved
clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved
clang/test/CodeGenOpenCL/convergent.cl Show resolved Hide resolved
Fznamznon
Fznamznon previously approved these changes Apr 20, 2020
AlexeySotkin
AlexeySotkin previously approved these changes Apr 20, 2020
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
@@ -865,14 +897,15 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,

std::unique_ptr<llvm::ToolOutputFile> ThinLinkOS, DwoOS;

// Clean-up SYCL device code if LLVM passes are disabled
if (LangOpts.SYCLIsDevice && CodeGenOpts.DisableLLVMPasses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a suggestion for upstreaming purposes: DisableLLVMPasses is supposed to be a mode to retrieve LLVM IR as generated by clang before any LLVM optimizations (helpful for debugging). Perhaps it would be better to have a specific mode which would not run optimizations yet would run some clean-up passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is just a refactoring of existing functionality.
@Naghasan, are you okay with addressing it in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was more informative. As you refactored it, this was highlighted, but that's clearly out of scope.

@bader bader dismissed stale reviews from AlexeySotkin and Fznamznon via 7c8bf8f April 20, 2020 13:58
Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFE stuff looks fine.

@bader
Copy link
Contributor Author

bader commented Apr 21, 2020

@AlexeySotkin, @AlexeySachkov, ping.

@bader bader merged commit 988d8cd into intel:sycl Apr 21, 2020
@bader bader deleted the enable-sycl-optimizations branch April 21, 2020 17:17
@keryell
Copy link
Contributor

keryell commented Apr 22, 2020

I guess that for FPGA when you know you have only 1 work-item you can have -O3 even with SPIR-V.

@bader
Copy link
Contributor Author

bader commented Apr 22, 2020

I guess that for FPGA when you know you have only 1 work-item you can have -O3 even with SPIR-V.

Unfortunately, it's not related to # of work-items, but rather issues are caused by the kernel code. I see problems with vectors & loops, where O3 passes use constructs unsupported either by SPIR-V translator or (I suspect) exposes bugs in OpenCL implementations consuming optimized SPIR-V.
Final goal is to address all theses issues, but it might take some time.

@bader bader added the performance Performance related issues label Oct 30, 2020
bader pushed a commit to bader/llvm that referenced this pull request Jul 11, 2024
FP power operation is legalized using the identity ``pow(x,y) = pow(2, y
* log2(x))``.

This fixes the latest WL compilation failure for ll->XeISA path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants