Integrate marlin fp16/bf16-float8 matrix multiplication kernel

Since the introduction of mixed-precision fp16-int4 [MARLIN](https://github.com/IST-DASLab/marlin) (Mixed Auto-Regressive Linear) kernels by IST-DASLab, new mixed-precision MARLIN kernels have been introduced for other data types.

In particular, mixed-precision fp16/bf16-float8 kernels have been [contributed to TGI](https://github.com/huggingface/text-generation-inference/pull/2213) and could be integrated in `optimum-quanto` as well with a companion `FP8MarlinQBytesTensor` to pack the weights.
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate marlin fp16/bf16-float8 matrix multiplication kernel #238

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development