[Feature]: Add moe_wna16 kernel as a backend for CompressedTensorsWNA16MoEMethod

### 🚀 The feature, motivation and pitch

A triton implementation to support MoE layers quantized with GPTQ or AWQ was implemented in https://github.com/vllm-project/vllm/pull/12185

It is more performant than the current Marlin MoE kernel in the case where there are many, small experts - which is why I ported it to be the default in the case of `num_experts > 32` for AWQ and GPTQMarlin configs https://github.com/vllm-project/vllm/pull/13236

We should also propagate the usage of this kernel to `compressed-tensors` that have mixed precision.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add moe_wna16 kernel as a backend for CompressedTensorsWNA16MoEMethod #13575

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add moe_wna16 kernel as a backend for CompressedTensorsWNA16MoEMethod #13575

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions