Integrate marlin fp16/bf16-float8 matrix multiplication kernel #238
Closed
Description
Since the introduction of mixed-precision fp16-int4 MARLIN (Mixed Auto-Regressive Linear) kernels by IST-DASLab, new mixed-precision MARLIN kernels have been introduced for other data types.
In particular, mixed-precision fp16/bf16-float8 kernels have been contributed to TGI and could be integrated in optimum-quanto
as well with a companion FP8MarlinQBytesTensor
to pack the weights.
Activity