Open
Description
from @drisspg
Summary
There are two components to this, non_saturated casting and saturated casting.
Non-Saturated casting
- We are currently using bit logic to cast from fp32 to fp8 where as there exists intrinsics to perform the same, see Nikitas comment below.
- Currently for fp16 -> fp8 casting we actually first rescaled fp16 to fp32 and then recast to fp8.
Saturated Casting
- In this codebase we write out the saturated cast logic explicitly by clamping prior to conversion: https://github.com/pytorch-labs/float8_experimental/blob/cdcadb57c5f4736d1a78a794da98afd398571942/float8_experimental/float8_utils.py#L19
There does appear to be intrinisics with PTX for doing saturated casts, see: https://github.com/openai/triton/blob/10f59d8ce04052521c1bc0cb3a3f8b98918fc7e3/lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp#L182
copied from pytorch-labs/float8_experimental#83