You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exllama v1/v2 + Marlin have (or will have) auto padding of infeatures and outfeatures so many models can run despite not exactly matching kernel restrictions.
However, if the padding is done dynamically at runtime, and we pack the padded tensors, the result weights may not be compatible with vllm/sglang as the shape of the tensors are off due to padding.
This is a theory at the moment and needs investigation. If confirmed, we need to trim the tensors during or post packing to original infeatures/outfeatures shape/size/
The text was updated successfully, but these errors were encountered:
Exllama v1/v2 + Marlin have (or will have) auto padding of infeatures and outfeatures so many models can run despite not exactly matching kernel restrictions.
However, if the padding is done dynamically at runtime, and we pack the padded tensors, the result weights may not be compatible with vllm/sglang as the shape of the tensors are off due to padding.
This is a theory at the moment and needs investigation. If confirmed, we need to trim the tensors during or post packing to original infeatures/outfeatures shape/size/
The text was updated successfully, but these errors were encountered: