-
Notifications
You must be signed in to change notification settings - Fork 25.1k
refine fp32 precision api #125888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine fp32 precision api #125888
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125888
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (6 Unrelated Failures)As of commit 467fe0e with merge base 78ee2ee ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
Based on the [conversation](#121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "HIGH". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision')  ### We provide 4 fp32 compute precision can be set: - **"ieee"**: computation will happened at pure FP32 level, BF16 an TF32 are not allowed. - **"tf32"**: allowed to use tf32 as internal computation data types. - **"bf16"**: allowed to use bf16 as internal computation data types. - **"default"**: no specific precisions are set. We will search it's parent precision under layered structure. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
Based on the [conversation](#121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "HIGH". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision')  ### We provide 4 fp32 compute precision can be set: - **"ieee"**: computation will happened at pure FP32 level, BF16 an TF32 are not allowed. - **"tf32"**: allowed to use tf32 as internal computation data types. - **"bf16"**: allowed to use bf16 as internal computation data types. - **"default"**: no specific precisions are set. We will search it's parent precision under layered structure. ### Examples ```python # change top level fp32_precision from default value "ieee" to "tf32" >>> torch.backends.fp32_precision "ieee" >>> torch.backends.fp32_precision="tf32" >>> torch.backends.fp32_precision "tf32" ``` ### Backward Compatible Since new API allow user to control at ops level. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` control both conv/rnn, and we are providing `torch.backends.cudnn.conv.fp32_precision` and `torch.backends.cudnn.rnn.fp32_precision`. For "set" method of `torch.backends.cudnn.allow_tf32=xyz`, it can work under a BC way but for "get" method. 1 flag `torch.backends.cudnn.allow_tf32` are not enough to represent 2 operators status. We will raise a warning here. ```python # When user use torch.backends.cudnn.allow_tf32 to "set", we will set both >>> torch.backends.cudnn.conv.fp32_precision 'tf32' >>> torch.backends.cudnn.rnn.fp32_precision 'tf32' >>> torch.backends.cudnn.allow_tf32 = False >>> torch.backends.cudnn.conv.fp32_precision 'ieee' >>> torch.backends.cudnn.rnn.fp32_precision 'ieee' # When user use torch.backends.cudnn.allow_tf32 to "get", we will return true only when both fp32_precision are `tf32`. And if the settings for `conv` and `rnn` are different , we will warn user that the actually situation >>> torch.backends.cudnn.allow_tf32 = True >>> torch.backends.cudnn.rnn.fp32_precision = "ieee" >>> torch.backends.cudnn.allow_tf32 [W511 16:22:07.017584786 Context.cpp:152] Warning: We allow to set different float32 precision for conv and rnn but your are querying float32 precision without a specific op.The current float32 precision for conv is tf32 and for rnn is ieee (function allowTF32CuDNN) False ``` We have similar situation between `torch.float32_matmul_precision` and `torch.backends.cuda.matmul.fp32_precision`\`torch.backends.mkldnn.matmul.fp32_precision`. The `set` method for `torch.float32_matmul_precision` will work in a BC way and we will raise warning for `get` method of `torch.float32_matmul_precision`. ``` # set method >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.get_float32_matmul_precision() 'highest' >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.backends.mkldnn.matmul.fp32_precision 'ieee' >>> torch.set_float32_matmul_precision("medium") >>> torch.backends.cuda.matmul.fp32_precision 'tf32' >>> torch.backends.mkldnn.matmul.fp32_precision 'bf16' # get method >>> torch.set_float32_matmul_precision("highest") >>> torch.backends.cuda.matmul.fp32_precision = "tf32" >>> torch.get_float32_matmul_precision() [W511 18:14:56.441053716 Context.cpp:289] Warning: We allow to set different float32 matmul precision for mkldnn and cuda but you are querying float32 matmul precision without a specific backend.The current float32 matmul precision for cuda is tf32 and for mkldnn is ieee (function operator()) 'highest' ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
Based on the [conversation](#121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "HIGH". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision')  ### We provide 4 fp32 compute precision can be set: - **"ieee"**: computation will happened at pure FP32 level, BF16 an TF32 are not allowed. - **"tf32"**: allowed to use tf32 as internal computation data types. - **"bf16"**: allowed to use bf16 as internal computation data types. - **"default"**: no specific precisions are set. We will search it's parent precision under layered structure. ### Examples ```python # change top level fp32_precision from default value "ieee" to "tf32" >>> torch.backends.fp32_precision "ieee" >>> torch.backends.fp32_precision="tf32" >>> torch.backends.fp32_precision "tf32" ``` ### Backward Compatible Since new API allow user to control at ops level. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` control both conv/rnn, and we are providing `torch.backends.cudnn.conv.fp32_precision` and `torch.backends.cudnn.rnn.fp32_precision`. For "set" method of `torch.backends.cudnn.allow_tf32=xyz`, it can work under a BC way but for "get" method. 1 flag `torch.backends.cudnn.allow_tf32` are not enough to represent 2 operators status. We will raise a warning here. ```python # When user use torch.backends.cudnn.allow_tf32 to "set", we will set both >>> torch.backends.cudnn.conv.fp32_precision 'tf32' >>> torch.backends.cudnn.rnn.fp32_precision 'tf32' >>> torch.backends.cudnn.allow_tf32 = False >>> torch.backends.cudnn.conv.fp32_precision 'ieee' >>> torch.backends.cudnn.rnn.fp32_precision 'ieee' # When user use torch.backends.cudnn.allow_tf32 to "get", we will return true only when both fp32_precision are `tf32`. And if the settings for `conv` and `rnn` are different , we will warn user that the actually situation >>> torch.backends.cudnn.allow_tf32 = True >>> torch.backends.cudnn.rnn.fp32_precision = "ieee" >>> torch.backends.cudnn.allow_tf32 [W511 16:22:07.017584786 Context.cpp:152] Warning: We allow to set different float32 precision for conv and rnn but your are querying float32 precision without a specific op.The current float32 precision for conv is tf32 and for rnn is ieee (function allowTF32CuDNN) False ``` We have similar situation between `torch.float32_matmul_precision` and `torch.backends.cuda.matmul.fp32_precision` \ `torch.backends.mkldnn.matmul.fp32_precision`. The `set` method for `torch.float32_matmul_precision` will work in a BC way and we will raise warning for `get` method of `torch.float32_matmul_precision`. ``` # set method >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.get_float32_matmul_precision() 'highest' >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.backends.mkldnn.matmul.fp32_precision 'ieee' >>> torch.set_float32_matmul_precision("medium") >>> torch.backends.cuda.matmul.fp32_precision 'tf32' >>> torch.backends.mkldnn.matmul.fp32_precision 'bf16' # get method >>> torch.set_float32_matmul_precision("highest") >>> torch.backends.cuda.matmul.fp32_precision = "tf32" >>> torch.get_float32_matmul_precision() [W511 18:14:56.441053716 Context.cpp:289] Warning: We allow to set different float32 matmul precision for mkldnn and cuda but you are querying float32 matmul precision without a specific backend.The current float32 matmul precision for cuda is tf32 and for mkldnn is ieee (function operator()) 'highest' ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
Hi @atalman , @jithunnair-amd and @jeffdaily, The direct root cause of Since I'm not familiar with hip cublas related, I just set float32_matmul_precision to highest as it is in main branch when is hip. I'm not sure that whether HIPBLASLT_ALLOW_TF32=1 and the specific logic of MI300 is a workaround to bypass Please correct me if I misunderstand. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
ghstack-source-id: 63962ec Pull Request resolved: pytorch#125888
@zhuhaozhe @albanD this PR is causing:
Which seems bad:
|
Hi @jansel, Thanks for pointing this out! I draft #158209 to complete Now the warning is updated to
Please let me know if #158209 can help. Thanks! |
I left some comments on that PR. Perhaps we should just remove the warning for now, then roll the out as follows:
|
We want to deprecate old APIs. I have updated the docs to direct people to the new APIs in #158209. Please take a look again! |
### Description This PR is to enable TF32 as fp32 internal precision for matmul/linear/conv in `mkldnn backend`. Since we have refined fp32 precision API in #125888, we can easily extend the API to support TF32 for `mkldnn backend`. ``` torch.backends.mkldnn.matmul.fp32_precision = 'tf32' torch.backends.mkldnn.conv.fp32_precision = "tf32" ``` Related kernel update and UTs update are done. And the wrapper `bf32_on_and _off` is updated to `reduced_f32_on_and_off`, and it can run tests 3 times, one is reduced_f32 OFF, the other two are reduced_f32 ON (including `bf32 ON` and `tf32 ON`). Pull Request resolved: #157520 Approved by: https://github.com/mingfeima, https://github.com/jansel
…l.fp32_precision` (#161102) For #161022 The warning says the old API will be deprecated in 2.9+ anyway, leaving it up to the author of #125888 to decide on initialization behavior then Pull Request resolved: #161102 Approved by: https://github.com/ngimel, https://github.com/drisspg, https://github.com/BoyuanFeng
Based on the conversation, we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it.
Design Choice: Directly use algorithms name like "TF32", "BF16".
Pros
tf32x3
Cons
We provide a layered structure for backends/operators.
('f32' is short for 'fp32_precision')

We provide 3 fp32 compute precision can be set:
Overriding Precision Settings
Child node can be override by its father node if it is set to default.
For current default settings:
torch.backends.mkldnn.fp32_precision="bf16"
, his child nodestorch.backends.mkldnn.matmul.fp32_precision
/torch.backends.mkldnn.conv.fp32_precision
/torch.backends.mkldnn.rnn.fp32_precision
will also be override to "bf16".torch.backends.fp32_precision="bf16"
,torch.backends.mkldnn.fp32_precision
and his child nodes will also we override to "bf16".Backward Compatible
Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous
torch.backends.cudnn.allow_tf32
are not enough to represent the status fortorch.backends.cudnn.rnn.fp32_precision="ieee"
andtorch.backends.cudnn.conv.fp32_precision="tf32"
. Therefore, our goal for backward compatible isTest Plan
Stack from ghstack (oldest at bottom):
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @voznesenskym @penguinwu @EikanWang @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @rec