refine fp32 precision api #125888

zhuhaozhe · 2024-05-10T01:20:58Z

Based on the conversation, we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it.

Design Choice: Directly use algorithms name like "TF32", "BF16".

Pros

The names are more informative. 'tf32' is more informative than a simple "high".
Easier to extend new algorithm like tf32x3

Cons

"HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them.

We provide a layered structure for backends/operators.

('f32' is short for 'fp32_precision')

We provide 3 fp32 compute precision can be set:

"ieee": Not allowed to use any other internal computation data types .
"tf32": Allowed to use tf32 as internal computation data types.
"bf16": Allowed to use bf16 as internal computation data types.
"none": Precision's are not set. Can be override by its father node.

Overriding Precision Settings

Child node can be override by its father node if it is set to default.
For current default settings:

backend = generic, op = all, precision setting = none
    backend = cuda, op = all, precision setting = none
        backend = cuda, op = conv, precision setting = tf32
        backend = cuda, op = rnn, precision setting = tf32
        backend = cuda, op = matmul, precision setting = none
    backend = matmul, op = all, precision setting = none
        backend = matmul, op = conv, precision setting = none
        backend = matmul, op = rnn, precision setting = none
        backend = matmul, op = matmul, precision setting = none

If the user set torch.backends.mkldnn.fp32_precision="bf16", his child nodes torch.backends.mkldnn.matmul.fp32_precision / torch.backends.mkldnn.conv.fp32_precision / torch.backends.mkldnn.rnn.fp32_precision will also be override to "bf16".
If the user set torch.backends.fp32_precision="bf16", torch.backends.mkldnn.fp32_precision and his child nodes will also we override to "bf16".

Backward Compatible

Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous torch.backends.cudnn.allow_tf32 are not enough to represent the status for torch.backends.cudnn.rnn.fp32_precision="ieee" and torch.backends.cudnn.conv.fp32_precision="tf32". Therefore, our goal for backward compatible is

If the user only uses previous APIs, it will work as previous expectations.
If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user.

Test Plan

python test/test_cuda.py -k test_fp32_precision_with_tf32
python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision
python test/test_cuda.py -k test_invalid_status_for_legacy_api
python test/test_mkldnn.py -k test_mlkdnn_get_set
python test/test_mkldnn.py -k test_generic_precision
python test/test_mkldnn.py -k test_invalid
python test/test_mkldnn.py -k test_default_use_parent

Stack from ghstack (oldest at bottom):

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @voznesenskym @penguinwu @EikanWang @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @rec

pytorch-bot · 2024-05-10T01:21:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125888

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (6 Unrelated Failures)

As of commit 467fe0e with merge base 78ee2ee ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
RuntimeError: No CUDA GPUs are available
inductor / unit-test / cuda12.8-py3.13-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_aot_eager

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / unit-test / cuda12.8-py3.12-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_aot_eager
pull / linux-jammy-py3.9-clang12-onnx / test (default, 1, 2, linux.2xlarge) (gh) (trunk failure)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3.9-clang12-onnx / test (default, 2, 2, linux.2xlarge) (gh) (trunk failure)
Final attempt failed. Child_process exited with error code 1

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

rocm / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.2, unstable) (gh) (#156098)
test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: e9d5141 Pull Request resolved: #125888

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 73f3cfd Pull Request resolved: #125888

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: a4c02dc Pull Request resolved: #125888

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

eqy · 2024-05-10T21:45:57Z

CC @mruberry who authored #76440 , @ptrblck
as it seems like this PR abandons "medium, high, highest"

zhuhaozhe · 2024-05-11T01:51:42Z

CC @mruberry who authored #76440 , @ptrblck as it seems like this PR abandons "medium, high, highest"

Hi, @eqy. This is a WIP draft PR based on conversation here #121791. It request your review automatically
I will summarize the design options asap, thanks.

Based on the [conversation](#121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "HIGH". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/pytorch/pytorch/assets/54701539/9cddf275-071c-4f69-a5ee-1540f78ac7f4) ### We provide 4 fp32 compute precision can be set: - **"ieee"**: computation will happened at pure FP32 level, BF16 an TF32 are not allowed. - **"tf32"**: allowed to use tf32 as internal computation data types. - **"bf16"**: allowed to use bf16 as internal computation data types. - **"default"**: no specific precisions are set. We will search it's parent precision under layered structure. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Based on the [conversation](#121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "HIGH". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/pytorch/pytorch/assets/54701539/9cddf275-071c-4f69-a5ee-1540f78ac7f4) ### We provide 4 fp32 compute precision can be set: - **"ieee"**: computation will happened at pure FP32 level, BF16 an TF32 are not allowed. - **"tf32"**: allowed to use tf32 as internal computation data types. - **"bf16"**: allowed to use bf16 as internal computation data types. - **"default"**: no specific precisions are set. We will search it's parent precision under layered structure. ### Examples ```python # change top level fp32_precision from default value "ieee" to "tf32" >>> torch.backends.fp32_precision "ieee" >>> torch.backends.fp32_precision="tf32" >>> torch.backends.fp32_precision "tf32" ``` ### Backward Compatible Since new API allow user to control at ops level. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` control both conv/rnn, and we are providing `torch.backends.cudnn.conv.fp32_precision` and `torch.backends.cudnn.rnn.fp32_precision`. For "set" method of `torch.backends.cudnn.allow_tf32=xyz`, it can work under a BC way but for "get" method. 1 flag `torch.backends.cudnn.allow_tf32` are not enough to represent 2 operators status. We will raise a warning here. ```python # When user use torch.backends.cudnn.allow_tf32 to "set", we will set both >>> torch.backends.cudnn.conv.fp32_precision 'tf32' >>> torch.backends.cudnn.rnn.fp32_precision 'tf32' >>> torch.backends.cudnn.allow_tf32 = False >>> torch.backends.cudnn.conv.fp32_precision 'ieee' >>> torch.backends.cudnn.rnn.fp32_precision 'ieee' # When user use torch.backends.cudnn.allow_tf32 to "get", we will return true only when both fp32_precision are `tf32`. And if the settings for `conv` and `rnn` are different , we will warn user that the actually situation >>> torch.backends.cudnn.allow_tf32 = True >>> torch.backends.cudnn.rnn.fp32_precision = "ieee" >>> torch.backends.cudnn.allow_tf32 [W511 16:22:07.017584786 Context.cpp:152] Warning: We allow to set different float32 precision for conv and rnn but your are querying float32 precision without a specific op.The current float32 precision for conv is tf32 and for rnn is ieee (function allowTF32CuDNN) False ``` We have similar situation between `torch.float32_matmul_precision` and `torch.backends.cuda.matmul.fp32_precision`\`torch.backends.mkldnn.matmul.fp32_precision`. The `set` method for `torch.float32_matmul_precision` will work in a BC way and we will raise warning for `get` method of `torch.float32_matmul_precision`. ``` # set method >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.get_float32_matmul_precision() 'highest' >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.backends.mkldnn.matmul.fp32_precision 'ieee' >>> torch.set_float32_matmul_precision("medium") >>> torch.backends.cuda.matmul.fp32_precision 'tf32' >>> torch.backends.mkldnn.matmul.fp32_precision 'bf16' # get method >>> torch.set_float32_matmul_precision("highest") >>> torch.backends.cuda.matmul.fp32_precision = "tf32" >>> torch.get_float32_matmul_precision() [W511 18:14:56.441053716 Context.cpp:289] Warning: We allow to set different float32 matmul precision for mkldnn and cuda but you are querying float32 matmul precision without a specific backend.The current float32 matmul precision for cuda is tf32 and for mkldnn is ieee (function operator()) 'highest' ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 4634f6b Pull Request resolved: #125888

Based on the [conversation](#121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "HIGH". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/pytorch/pytorch/assets/54701539/9cddf275-071c-4f69-a5ee-1540f78ac7f4) ### We provide 4 fp32 compute precision can be set: - **"ieee"**: computation will happened at pure FP32 level, BF16 an TF32 are not allowed. - **"tf32"**: allowed to use tf32 as internal computation data types. - **"bf16"**: allowed to use bf16 as internal computation data types. - **"default"**: no specific precisions are set. We will search it's parent precision under layered structure. ### Examples ```python # change top level fp32_precision from default value "ieee" to "tf32" >>> torch.backends.fp32_precision "ieee" >>> torch.backends.fp32_precision="tf32" >>> torch.backends.fp32_precision "tf32" ``` ### Backward Compatible Since new API allow user to control at ops level. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` control both conv/rnn, and we are providing `torch.backends.cudnn.conv.fp32_precision` and `torch.backends.cudnn.rnn.fp32_precision`. For "set" method of `torch.backends.cudnn.allow_tf32=xyz`, it can work under a BC way but for "get" method. 1 flag `torch.backends.cudnn.allow_tf32` are not enough to represent 2 operators status. We will raise a warning here. ```python # When user use torch.backends.cudnn.allow_tf32 to "set", we will set both >>> torch.backends.cudnn.conv.fp32_precision 'tf32' >>> torch.backends.cudnn.rnn.fp32_precision 'tf32' >>> torch.backends.cudnn.allow_tf32 = False >>> torch.backends.cudnn.conv.fp32_precision 'ieee' >>> torch.backends.cudnn.rnn.fp32_precision 'ieee' # When user use torch.backends.cudnn.allow_tf32 to "get", we will return true only when both fp32_precision are `tf32`. And if the settings for `conv` and `rnn` are different , we will warn user that the actually situation >>> torch.backends.cudnn.allow_tf32 = True >>> torch.backends.cudnn.rnn.fp32_precision = "ieee" >>> torch.backends.cudnn.allow_tf32 [W511 16:22:07.017584786 Context.cpp:152] Warning: We allow to set different float32 precision for conv and rnn but your are querying float32 precision without a specific op.The current float32 precision for conv is tf32 and for rnn is ieee (function allowTF32CuDNN) False ``` We have similar situation between `torch.float32_matmul_precision` and `torch.backends.cuda.matmul.fp32_precision` \ `torch.backends.mkldnn.matmul.fp32_precision`. The `set` method for `torch.float32_matmul_precision` will work in a BC way and we will raise warning for `get` method of `torch.float32_matmul_precision`. ``` # set method >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.get_float32_matmul_precision() 'highest' >>> torch.backends.cuda.matmul.fp32_precision 'ieee' >>> torch.backends.mkldnn.matmul.fp32_precision 'ieee' >>> torch.set_float32_matmul_precision("medium") >>> torch.backends.cuda.matmul.fp32_precision 'tf32' >>> torch.backends.mkldnn.matmul.fp32_precision 'bf16' # get method >>> torch.set_float32_matmul_precision("highest") >>> torch.backends.cuda.matmul.fp32_precision = "tf32" >>> torch.get_float32_matmul_precision() [W511 18:14:56.441053716 Context.cpp:289] Warning: We allow to set different float32 matmul precision for mkldnn and cuda but you are querying float32 matmul precision without a specific backend.The current float32 matmul precision for cuda is tf32 and for mkldnn is ieee (function operator()) 'highest' ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

[ghstack-poisoned]

yanbing-j · 2025-06-23T06:05:57Z

Hi @atalman , @jithunnair-amd and @jeffdaily,

The direct root cause of PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_flex_decoding.py TestFlexDecodingCUDA.test_non_sparse_mulitple_block_size_cuda is a mismatch of the logic in allowTF32CuBLAS() with MI300 TF32 support, which uses HIPBLASLT_ALLOW_TF32=1 to indicate tf32 is allowed and enabled for cublas matmul in MI300, and instead of using float32_matmul_precision to judge whether tf32 is allowed. And UT only use torch.set_float32_matmul_precision("high") to set float32_matmul_precision, but env is not set.

Since I'm not familiar with hip cublas related, I just set float32_matmul_precision to highest as it is in main branch when is hip.

I'm not sure that whether HIPBLASLT_ALLOW_TF32=1 and the specific logic of MI300 is a workaround to bypass torch.backends.cuda.matmul.allow_tf32 is not supported on ROCm by default. mentioned in setAllowTF32CuBLAS. Will AMD support MI300 of tf32 without the specific logic in the future? Thanks!

Please correct me if I misunderstand.

[ghstack-poisoned]

yanbing-j · 2025-06-26T08:51:48Z

@pytorchbot merge

pytorchmergebot · 2025-06-26T08:53:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ghstack-source-id: 63962ec Pull Request resolved: pytorch#125888

jansel · 2025-07-12T17:51:23Z

@zhuhaozhe @albanD this PR is causing:

>>> import torch
>>> torch.backends.cudnn.allow_tf32 = True
/home/jansel/pytorch/torch/backends/__init__.py:46: UserWarning: This API is going to be deprecated, please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /home/jansel/pytorch/aten/src/ATen/Context.cpp:78.)
  self.setter(val)
>>>

Which seems bad:

The error message doesn't say which API is being deprecated, so if I get this printout from a large model it is hard to figure out what "This API" means. I was only able to figure out what the error was talking about by grepping the PyTorch source code. I think this will confuse users.
The webpage the error message links to tells me to use torch.backends.cudnn.allow_tf32 = True (the exact thing causing the error) with no message about deprecations.

yanbing-j · 2025-07-14T02:49:55Z

@zhuhaozhe @albanD this PR is causing:
>>> import torch
>>> torch.backends.cudnn.allow_tf32 = True
/home/jansel/pytorch/torch/backends/__init__.py:46: UserWarning: This API is going to be deprecated, please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /home/jansel/pytorch/aten/src/ATen/Context.cpp:78.)
  self.setter(val)
>>> 
Which seems bad:

The error message doesn't say which API is being deprecated, so if I get this printout from a large model it is hard to figure out what "This API" means. I was only able to figure out what the error was talking about by grepping the PyTorch source code. I think this will confuse users.

The webpage the error message links to tells me to use torch.backends.cudnn.allow_tf32 = True (the exact thing causing the error) with no message about deprecations.

Hi @jansel, Thanks for pointing this out!

I draft #158209 to complete The API warning, and update the webpage context to be more marked to suggest user to use a new API setting.

Now the warning is updated to

>>> import torch
>>> torch.backends.cudnn.allow_tf32 = True
/home/yanbingj/projects/pytorch/torch/backends/__init__.py:46: UserWarning: Suggest to use a new setting of API control of a more fine-grained TF32 behavior, e.g, torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old setting, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() are still supported, and is going to be deprecated. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /home/yanbingj/projects/pytorch/aten/src/ATen/Context.cpp:78.)
  self.setter(val)

Please let me know if #158209 can help. Thanks!

jansel · 2025-07-14T03:13:17Z

I left some comments on that PR. Perhaps we should just remove the warning for now, then roll the out as follows:

Add the new APIs
Update the docs to direct people to the new APIs (currently the docs still tell people to use the "deprecated" APIs)
If we plan to keep the old API forever, don't make old API emit a warning
If we plan to delete the old API, emit a warning with a schedule for when we will delete the old API

yanbing-j · 2025-07-14T06:43:34Z

We want to deprecate old APIs. I have updated the docs to direct people to the new APIs in #158209. Please take a look again!

### Description This PR is to enable TF32 as fp32 internal precision for matmul/linear/conv in `mkldnn backend`. Since we have refined fp32 precision API in #125888, we can easily extend the API to support TF32 for `mkldnn backend`. ``` torch.backends.mkldnn.matmul.fp32_precision = 'tf32' torch.backends.mkldnn.conv.fp32_precision = "tf32" ``` Related kernel update and UTs update are done. And the wrapper `bf32_on_and _off` is updated to `reduced_f32_on_and_off`, and it can run tests 3 times, one is reduced_f32 OFF, the other two are reduced_f32 ON (including `bf32 ON` and `tf32 ON`). Pull Request resolved: #157520 Approved by: https://github.com/mingfeima, https://github.com/jansel

…l.fp32_precision` (#161102) For #161022 The warning says the old API will be deprecated in 2.9+ anyway, leaving it up to the author of #125888 to decide on initialization behavior then Pull Request resolved: #161102 Approved by: https://github.com/ngimel, https://github.com/drisspg, https://github.com/BoyuanFeng

zhuhaozhe requested a review from eqy as a code owner May 10, 2024 01:20

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label May 10, 2024

zhuhaozhe changed the title ~~refine fp32 precision api~~ [WIP] refine fp32 precision api May 10, 2024

zhuhaozhe marked this pull request as draft May 10, 2024 01:21

zhuhaozhe added the ciflow/trunk Trigger trunk jobs on your pull request label May 10, 2024

refine fp32 precision api

4a54d4b

[ghstack-poisoned]

pytorchbot added the open source label May 10, 2024

zhuhaozhe added a commit that referenced this pull request May 10, 2024

refine fp32 precision api

63b5334

ghstack-source-id: e9d5141 Pull Request resolved: #125888

Update on "[WIP] refine fp32 precision api"

2ce658b

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

zhuhaozhe added a commit that referenced this pull request May 10, 2024

refine fp32 precision api

ca40288

ghstack-source-id: 73f3cfd Pull Request resolved: #125888

Update on "[WIP] refine fp32 precision api"

f4c899c

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

zhuhaozhe added a commit that referenced this pull request May 10, 2024

refine fp32 precision api

f5edd77

ghstack-source-id: a4c02dc Pull Request resolved: #125888

Update on "[WIP] refine fp32 precision api"

881ff0d

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

zhuhaozhe removed the request for review from eqy May 11, 2024 01:52

zhuhaozhe added a commit that referenced this pull request May 13, 2024

refine fp32 precision api

561f21b

ghstack-source-id: 4634f6b Pull Request resolved: #125888

This was referenced May 13, 2024

allow to use bf16 as fp32 internal precision for mkldnn conv #126050

Closed

allow to use bf16 as fp32 internal precision for mkldnn rnn #126051

Closed

allow to use bf16 as fp32 internal precision for mkldnn conv backward #126054

Closed

zhuhaozhe requested review from jgong5 May 14, 2024 13:28

zhuhaozhe changed the title ~~[WIP] refine fp32 precision api~~ refine fp32 precision api May 14, 2024

yanbing-j added 3 commits June 13, 2025 03:18

Update

2bb8483

[ghstack-poisoned]

Update

a33cc14

[ghstack-poisoned]

Update

af62606

[ghstack-poisoned]

pytorch-bot bot added the module: inductor label Jun 21, 2025

yanbing-j added 3 commits June 21, 2025 06:44

Update

cfdc4fa

[ghstack-poisoned]

Update

ee6c03b

[ghstack-poisoned]

Update

ca7e0ba

[ghstack-poisoned]

yanbing-j mentioned this pull request Jun 25, 2025

allow to use tf32 as fp32 internal precision for mkldnn conv/matmul/rnn #156802

Closed

yanbing-j added 2 commits June 25, 2025 07:47

Update

9198c8d

[ghstack-poisoned]

Update

467fe0e

[ghstack-poisoned]

pytorchmergebot added the merging label Jun 26, 2025

pytorchmergebot closed this in 53e0b9c Jun 26, 2025

pytorchmergebot removed the merging label Jun 26, 2025

yanbing-j pushed a commit to yanbing-j/pytorch that referenced this pull request Jun 30, 2025

refine fp32 precision api

2a25055

ghstack-source-id: 63962ec Pull Request resolved: pytorch#125888

yanbing-j mentioned this pull request Jul 3, 2025

Enable TF32 as fp32 internal precision for matmul/linear/conv #157520

Closed

yanbing-j mentioned this pull request Jul 14, 2025

Update warning of TF32 #158209

Closed

eqy mentioned this pull request Aug 5, 2025

[WIP] Attempt to fix torch.backends.cudnn.rnn import #159828

Open

github-actions bot deleted the gh/zhuhaozhe/28/head branch August 14, 2025 02:19

This was referenced Aug 20, 2025

[FlexAttention][TF32] Handle uninitialized torch.backends.cuda.matmul.fp32_precision #161102

Closed

FlexAttention default precision decreased from ieee to tf32 since stable #161022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refine fp32 precision api #125888

refine fp32 precision api #125888

Uh oh!

zhuhaozhe commented May 10, 2024 •

edited by yanbing-j

Loading

Uh oh!

pytorch-bot bot commented May 10, 2024 •

edited

Loading

Uh oh!

eqy commented May 10, 2024

Uh oh!

zhuhaozhe commented May 11, 2024 •

edited

Loading

Uh oh!

yanbing-j commented Jun 23, 2025

Uh oh!

yanbing-j commented Jun 26, 2025

Uh oh!

pytorchmergebot commented Jun 26, 2025

Uh oh!

jansel commented Jul 12, 2025

Uh oh!

yanbing-j commented Jul 14, 2025 •

edited

Loading

Uh oh!

jansel commented Jul 14, 2025

Uh oh!

yanbing-j commented Jul 14, 2025

Uh oh!

Uh oh!

refine fp32 precision api #125888

refine fp32 precision api #125888

Uh oh!

Conversation

zhuhaozhe commented May 10, 2024 • edited by yanbing-j Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design Choice: Directly use algorithms name like "TF32", "BF16".

Pros

Cons

We provide a layered structure for backends/operators.

We provide 3 fp32 compute precision can be set:

Overriding Precision Settings

Backward Compatible

Test Plan

Uh oh!

pytorch-bot bot commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125888

✅ You can merge normally! (6 Unrelated Failures)

Uh oh!

eqy commented May 10, 2024

Uh oh!

zhuhaozhe commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yanbing-j commented Jun 23, 2025

Uh oh!

yanbing-j commented Jun 26, 2025

Uh oh!

pytorchmergebot commented Jun 26, 2025

Merge started

Uh oh!

jansel commented Jul 12, 2025

Uh oh!

yanbing-j commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jansel commented Jul 14, 2025

Uh oh!

yanbing-j commented Jul 14, 2025

Uh oh!

Uh oh!

zhuhaozhe commented May 10, 2024 •

edited by yanbing-j

Loading

pytorch-bot bot commented May 10, 2024 •

edited

Loading

zhuhaozhe commented May 11, 2024 •

edited

Loading

yanbing-j commented Jul 14, 2025 •

edited

Loading