Skip to content

[Feature]: Vectorize scaled_int8_quant #18866

Closed
@mgoin

Description

@mgoin

🚀 The feature, motivation and pitch

Similar to the recent discoveries in #18844, vectorizing our quantization methods can have a huge impact on e2e performance.

Currently we only use vectorization.h in csrc/quantization/fp8/common.cuh and csrc/quantization/fused_kernels/layernorm_utils.cuh, so we should expand this to more implementations like csrc/quantization/compressed_tensors/int8_quant_kernels.cu for faster INT8 activation quantization.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions