[Speculative Decoding] Enable arbitrary model inputs #5101

abhigoyal1997 · 2024-05-29T10:57:45Z

This PR changes the ModelRunner to support models with a different input signature than the default. Mainly, this will benefit speculative decoding methods where draft models are not standard Transformers-based LMs e.g., Medusa, EAGLE, RNN-based, etc. (additionally, in future, any LLMs with different input signature than the default).

For this, we need support for 2 things:

When ModelRunner calls the forward method of the model, only use the expected inputs.
- Match model inputs with the signature of the model's forward method.
- Don't prepare unnecessary inputs (this is good to have as it might reduce some overhead).
Add support for models requiring additional inputs than the default ones e.g., hidden_states in Medusa.
- Allow models to specify an optional config (shape and dtype) for the additional inputs (to capture CUDA graph).
- Prepare these additional inputs in ModelRunner (or in the model itself?) and pass them as inputs to the forward call.
  - Support inputs that come from the sequence (via seq_group_metadata_list)
  - Support inputs that live inside the Worker/ModelRunner as preserved state from prev. iteration

Part of refactoring #4978

DarkLight1337 · 2024-05-29T11:30:02Z

This would also be great for multi-modal LLMs which accept inputs from other modalities.

I am currently working on #4197 which enables additional inputs to be passed in via decorating the model class with input processors, but it assumes that the inputs are tied to specific modalities. Perhaps we can further generalize that idea in your PR?

abhigoyal1997 · 2024-05-29T12:13:48Z

This would also be great for multi-modal LLMs which accept inputs from other modalities.

I am currently working on #4197 which enables additional inputs to be passed in via decorating the model class with input processors, but it assumes that the inputs are tied to specific modalities. Perhaps we can further generalize that idea in your PR?

Hi @DarkLight1337

This looks like a reasonable idea to try out. We can generalize the MultiModalRegistry (maybe as InputRegistry) and register all additional inputs (including multi-model) using the same method.

However, I opened this PR just to refactor and simplify additional inputs in Medusa implementation (#4978) and I am more focused on that for now. Once that PR is closed, I can look more into this.

…model_input.

…model_inputs

abhigoyal1997 marked this pull request as draft May 29, 2024 10:58

abhigoyal1997 mentioned this pull request May 29, 2024

[Speculative Decoding] Medusa Implementation with Top-1 proposer #4978

Merged

abhigoyal1997 changed the title ~~[Misc] [Speculative Decoding] Enable arbitrary model inputs~~ [Speculative Decoding] Enable arbitrary model inputs May 29, 2024

DarkLight1337 mentioned this pull request May 30, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

53 tasks

Adding support for extra inputs in model_runner using _prepare_extra_…

6994da1

…model_input.

abhigoyal1997 force-pushed the model_inputs branch from c84f915 to 6994da1 Compare June 3, 2024 11:55

abhigoyal1997 added 5 commits June 3, 2024 17:44

Removing medusa line from config.py

b1fc923

Merge branch 'main' of github.fkinternal.com:abhinav-goyal/vllm into …

21c10eb

…model_inputs

fix for multi-model inputs + adding extra output to model_runner test

2bbd66c

formatting

31eb449

Merge branch 'main' of github.fkinternal.com:abhinav-goyal/vllm into …

3b03959

…model_inputs

josephrocca mentioned this pull request Jun 7, 2024

[Feature] Speculative Decoding InternLM/lmdeploy#1738

Open

abhigoyal1997 closed this Jun 24, 2024

abhigoyal1997 deleted the model_inputs branch June 24, 2024 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Speculative Decoding] Enable arbitrary model inputs #5101

[Speculative Decoding] Enable arbitrary model inputs #5101

Uh oh!

abhigoyal1997 commented May 29, 2024 •

edited

Loading

Uh oh!

DarkLight1337 commented May 29, 2024

Uh oh!

abhigoyal1997 commented May 29, 2024

Uh oh!

Uh oh!

Uh oh!

[Speculative Decoding] Enable arbitrary model inputs #5101

[Speculative Decoding] Enable arbitrary model inputs #5101

Uh oh!

Conversation

abhigoyal1997 commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented May 29, 2024

Uh oh!

abhigoyal1997 commented May 29, 2024

Uh oh!

Uh oh!

abhigoyal1997 commented May 29, 2024 •

edited

Loading