Skip to content

Integrate marlin fp16/bf16-float8 matrix multiplication kernel #238

Closed
@dacorvo

Description

Since the introduction of mixed-precision fp16-int4 MARLIN (Mixed Auto-Regressive Linear) kernels by IST-DASLab, new mixed-precision MARLIN kernels have been introduced for other data types.

In particular, mixed-precision fp16/bf16-float8 kernels have been contributed to TGI and could be integrated in optimum-quanto as well with a companion FP8MarlinQBytesTensor to pack the weights.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions