Commit a66cef8

authored and

committed

Fix FP8 Rowwise Gemm Compilation with Auto-functionalize V2

Summary: X-link: facebookresearch/FBGEMM#541 Torch recently introduced auto_functionalized_v2, which makes custom functions pickier about how they are defined. Specifically, torch no longer allows optional preallocated outputs. A custom function must either allocate a tensor and return it, or directly write to a preallocated output and return nothing. This conflicts with our impelemtnation of f8f8bf16_rowwise and could cause confusing behaviors or errors when compiled. The only solution is to split into two functions with correct signatures. This diff adds `f8f8bf16_rowwise_out`, which is a very thin wrapper that allows preallocated outputs. It's a bit annoying, but this should allow both versions of the function to compile correctly. Differential Revision: D66795225

1 parent 264f946 commit a66cef8Copy full SHA for a66cef8

4 files changed

+267

-195

lines changed

fbgemm_gpu/experimental/gen_ai
- src/quantize
  - ck_extensions/fp8_rowwise
    - fp8_rowwise_gemm.hip
  - cutlass_extensions
    - f8f8bf16_rowwise.cu
  - quantize.cpp
- test/quantize
  - quantize_test.py

4 files changed

+267

-195

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit a66cef8

4 files changed

4 files changed

File tree

4 files changed

4 files changed

0 commit comments