Skip to content

[Feature]: Composite model loading using AutoWeightsLoader for all models #15697

Open
@DarkLight1337

Description

@DarkLight1337

🚀 The feature, motivation and pitch

#9160 first introduced AutoWeightsLoader to recursively call load_weights on sub-modules. This lets composite models (most notably multi-modal models) use language backbones (*Model classes such as LlamaModel) without having to repeat their weight loading logic.

Currently, load_weights is only implemented in a few language backbones. It would be great to standardize this approach and apply it to all language backbones in vLLM. The steps to do this are pretty straightforward:

  1. Move the existing load_weights function from *ForCausalLM to *Model.
  2. Create a new load_weights function in *ForCausalLM that loads the weights using AutoWeightsLoader.
  3. Move any logic in *Model.load_weights that only applies to *ForCausalLM back to *ForCausalLM.load_weights. Usually, this involves lm_head.

For reference, you can look at the implementation for models such as Llama, Gemma2/3, Qwen2 and ChatGLM.

To avoid scope creep, I suggest opening a PR for updating only a few models at a time

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions