[Feature]: Vectorize `scaled_int8_quant`

### 🚀 The feature, motivation and pitch

Similar to the recent discoveries in https://github.com/vllm-project/vllm/pull/18844, vectorizing our quantization methods can have a huge impact on e2e performance.

Currently we only use `vectorization.h` in `csrc/quantization/fp8/common.cuh` and `csrc/quantization/fused_kernels/layernorm_utils.cuh`, so we should expand this to more implementations like `csrc/quantization/compressed_tensors/int8_quant_kernels.cu` for faster INT8 activation quantization.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Vectorize `scaled_int8_quant` #18866

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Vectorize scaled_int8_quant #18866

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature]: Vectorize `scaled_int8_quant` #18866