Open
Description
🚀 The feature, motivation and pitch
From #19452
We temporally force input to be contiguous, which may cause a high copy overhead when input is not contiguous.
I'm trying to work on this to support non-contiguous input for the quant kernel.
Scope:
dynamic_scaled_int8_quant
dynamic_per_token_scaled_fp8_quant
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.