[BUG] Padding of infeatures/outfeatures and packing #100

Qubitium · 2024-06-28T07:02:04Z

Exllama v1/v2 + Marlin have (or will have) auto padding of infeatures and outfeatures so many models can run despite not exactly matching kernel restrictions.

However, if the padding is done dynamically at runtime, and we pack the padded tensors, the result weights may not be compatible with vllm/sglang as the shape of the tensors are off due to padding.

This is a theory at the moment and needs investigation. If confirmed, we need to trim the tensors during or post packing to original infeatures/outfeatures shape/size/

Qubitium added the bug Something isn't working label Jun 28, 2024

Qubitium assigned Qubitium and ZX-ModelCloud Jun 28, 2024

Qubitium mentioned this issue Jun 28, 2024

[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

Merged

Qubitium closed this as completed in #98 Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Padding of infeatures/outfeatures and packing #100

[BUG] Padding of infeatures/outfeatures and packing #100

Qubitium commented Jun 28, 2024 •

edited

Loading

[BUG] Padding of infeatures/outfeatures and packing #100

[BUG] Padding of infeatures/outfeatures and packing #100

Comments

Qubitium commented Jun 28, 2024 • edited Loading

Qubitium commented Jun 28, 2024 •

edited

Loading