[CUDA] Make PTXAS optimisation default to -O3 #5188

hdelan · 2021-12-20T19:02:43Z

Previously the PTX optimization defaulted to -O0.

The ptxjitcompiler defaults to -O3, so this change makes the optimization levels of ahead of time and JIT ptxas compilation the same.

AGindinson

The changes LGTM as such; that said, would it be possible to commit the patch directly to LLORG?

hdelan · 2022-01-04T10:28:10Z

Review requested in LLORG https://reviews.llvm.org/D116583

hdelan · 2022-01-11T17:27:43Z

@tra in LLORG has said:

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

I have provided our rationale for changing it in the review, but it seems that this needs a bit more discussion if we want the change to be made in LLORG. Let me know your thoughts. Feel free to give your opinion in https://reviews.llvm.org/D116583

bader · 2022-01-24T13:51:00Z

@tra in LLORG has said:

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

I have provided our rationale for changing it in the review, but it seems that this needs a bit more discussion if we want the change to be made in LLORG. Let me know your thoughts. Feel free to give your opinion in https://reviews.llvm.org/D116583

@intel/dpcpp-clang-driver-reviewers, what is your opinion?
I don't understand the issues caused by optimization levels mismatch good enough to give any suggestions.

mdtoguchi · 2022-01-24T23:12:52Z

@tra in LLORG has said:

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

I have provided our rationale for changing it in the review, but it seems that this needs a bit more discussion if we want the change to be made in LLORG. Let me know your thoughts. Feel free to give your opinion in https://reviews.llvm.org/D116583

@intel/dpcpp-clang-driver-reviewers, what is your opinion? I don't understand the issues caused by optimization levels mismatch good enough to give any suggestions.

In general, I'm of the opinion that default optimization levels can be set to whatever we feel is proper for our product. Use of any disabling option like -O0 should inherently disable any default optimization levels and even use of a different level like -O1 or -O2 should override defaults as well.

hdelan · 2022-01-25T16:06:43Z

Use of any disabling option like -O0 should inherently disable any default optimization levels and even use of a different level like -O1 or -O2 should override defaults as well.

This behavior is unchanged by this PR. The only difference is the default level of ptxas when no -O level is specified.

The reason we thought this change would be beneficial is because of a few bugs we encountered in ptxas/ptxjitcompiler for different opt levels. This caused JIT errors but not offline ptxas errors when no opt value was provided. These kind of errors are more common than you'd think, and having different opt levels for ptxjitcompiler and ptxas makes them harder to track down.

hdelan requested review from AGindinson, bader, hchilama and mdtoguchi as code owners December 20, 2021 19:02

hdelan changed the title ~~Make PTXAS optimisation default to -O3~~ [CUDA] Make PTXAS optimisation default to -O3 Dec 20, 2021

hdelan force-pushed the hugh/ptxas_optimisation branch from 740563f to f9737be Compare December 20, 2021 22:17

bader removed their request for review December 21, 2021 08:29

bader added the cuda CUDA back-end label Dec 21, 2021

Changing the default optimisation of ptxas to agree with ptxjitcompiler

af0b942

hdelan force-pushed the hugh/ptxas_optimisation branch from f9737be to af0b942 Compare December 21, 2021 10:29

AGindinson reviewed Dec 31, 2021

View reviewed changes

mdtoguchi approved these changes Jan 28, 2022

View reviewed changes

bader merged commit c89a914 into intel:sycl Jan 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Make PTXAS optimisation default to -O3 #5188

[CUDA] Make PTXAS optimisation default to -O3 #5188

Uh oh!

hdelan commented Dec 20, 2021

Uh oh!

AGindinson left a comment

Uh oh!

hdelan commented Jan 4, 2022

Uh oh!

hdelan commented Jan 11, 2022

Uh oh!

bader commented Jan 24, 2022

Uh oh!

mdtoguchi commented Jan 24, 2022

Uh oh!

hdelan commented Jan 25, 2022

Uh oh!

Uh oh!

[CUDA] Make PTXAS optimisation default to -O3 #5188

[CUDA] Make PTXAS optimisation default to -O3 #5188

Uh oh!

Conversation

hdelan commented Dec 20, 2021

Uh oh!

AGindinson left a comment

Choose a reason for hiding this comment

Uh oh!

hdelan commented Jan 4, 2022

Uh oh!

hdelan commented Jan 11, 2022

Uh oh!

bader commented Jan 24, 2022

Uh oh!

mdtoguchi commented Jan 24, 2022

Uh oh!

hdelan commented Jan 25, 2022

Uh oh!

Uh oh!