Open
Description
🚀 The feature, motivation and pitch
#9160 first introduced AutoWeightsLoader
to recursively call load_weights
on sub-modules. This lets composite models (most notably multi-modal models) use language backbones (*Model
classes such as LlamaModel
) without having to repeat their weight loading logic.
Currently, load_weights
is only implemented in a few language backbones. It would be great to standardize this approach and apply it to all language backbones in vLLM. The steps to do this are pretty straightforward:
- Move the existing
load_weights
function from*ForCausalLM
to*Model
. - Create a new
load_weights
function in*ForCausalLM
that loads the weights usingAutoWeightsLoader
. - Move any logic in
*Model.load_weights
that only applies to*ForCausalLM
back to*ForCausalLM.load_weights
. Usually, this involveslm_head
.
For reference, you can look at the implementation for models such as Llama, Gemma2/3, Qwen2 and ChatGLM.
To avoid scope creep, I suggest opening a PR for updating only a few models at a time
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Type
Projects
Status
Done