Skip to content

Commit a66cef8

Browse files
jwfrommfacebook-github-bot
authored andcommitted
Fix FP8 Rowwise Gemm Compilation with Auto-functionalize V2
Summary: X-link: facebookresearch/FBGEMM#541 Torch recently introduced auto_functionalized_v2, which makes custom functions pickier about how they are defined. Specifically, torch no longer allows optional preallocated outputs. A custom function must either allocate a tensor and return it, or directly write to a preallocated output and return nothing. This conflicts with our impelemtnation of f8f8bf16_rowwise and could cause confusing behaviors or errors when compiled. The only solution is to split into two functions with correct signatures. This diff adds `f8f8bf16_rowwise_out`, which is a very thin wrapper that allows preallocated outputs. It's a bit annoying, but this should allow both versions of the function to compile correctly. Differential Revision: D66795225
1 parent 264f946 commit a66cef8

File tree

4 files changed

+267
-195
lines changed

4 files changed

+267
-195
lines changed

0 commit comments

Comments
 (0)