Skip to content

[CUDA] Make PTXAS optimisation default to -O3 #5188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2022

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Dec 20, 2021

Previously the PTX optimization defaulted to -O0.

The ptxjitcompiler defaults to -O3, so this change makes the optimization levels of ahead of time and JIT ptxas compilation the same.

@hdelan hdelan changed the title Make PTXAS optimisation default to -O3 [CUDA] Make PTXAS optimisation default to -O3 Dec 20, 2021
@hdelan hdelan force-pushed the hugh/ptxas_optimisation branch from 740563f to f9737be Compare December 20, 2021 22:17
@bader bader removed their request for review December 21, 2021 08:29
@bader bader added the cuda CUDA back-end label Dec 21, 2021
@hdelan hdelan force-pushed the hugh/ptxas_optimisation branch from f9737be to af0b942 Compare December 21, 2021 10:29
Copy link
Contributor

@AGindinson AGindinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM as such; that said, would it be possible to commit the patch directly to LLORG?

@hdelan
Copy link
Contributor Author

hdelan commented Jan 4, 2022

Review requested in LLORG https://reviews.llvm.org/D116583

@hdelan
Copy link
Contributor Author

hdelan commented Jan 11, 2022

@tra in LLORG has said:

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

I have provided our rationale for changing it in the review, but it seems that this needs a bit more discussion if we want the change to be made in LLORG. Let me know your thoughts. Feel free to give your opinion in https://reviews.llvm.org/D116583

@bader
Copy link
Contributor

bader commented Jan 24, 2022

@tra in LLORG has said:

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

I have provided our rationale for changing it in the review, but it seems that this needs a bit more discussion if we want the change to be made in LLORG. Let me know your thoughts. Feel free to give your opinion in https://reviews.llvm.org/D116583

@intel/dpcpp-clang-driver-reviewers, what is your opinion?
I don't understand the issues caused by optimization levels mismatch good enough to give any suggestions.

@mdtoguchi
Copy link
Contributor

@tra in LLORG has said:

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

I have provided our rationale for changing it in the review, but it seems that this needs a bit more discussion if we want the change to be made in LLORG. Let me know your thoughts. Feel free to give your opinion in https://reviews.llvm.org/D116583

@intel/dpcpp-clang-driver-reviewers, what is your opinion? I don't understand the issues caused by optimization levels mismatch good enough to give any suggestions.

In general, I'm of the opinion that default optimization levels can be set to whatever we feel is proper for our product. Use of any disabling option like -O0 should inherently disable any default optimization levels and even use of a different level like -O1 or -O2 should override defaults as well.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 25, 2022

Use of any disabling option like -O0 should inherently disable any default optimization levels and even use of a different level like -O1 or -O2 should override defaults as well.

This behavior is unchanged by this PR. The only difference is the default level of ptxas when no -O level is specified.

The reason we thought this change would be beneficial is because of a few bugs we encountered in ptxas/ptxjitcompiler for different opt levels. This caused JIT errors but not offline ptxas errors when no opt value was provided. These kind of errors are more common than you'd think, and having different opt levels for ptxjitcompiler and ptxas makes them harder to track down.

@bader bader merged commit c89a914 into intel:sycl Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants