Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Implement merged input processor for LLaVA model #10676

Merged
merged 24 commits into from
Dec 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
7b6c4f1
Add `get_dummy_data` to `MultiModalProcessor`; fix and test `iter_pla…
DarkLight1337 Nov 26, 2024
de8332a
Use merged processor for llava model
DarkLight1337 Nov 26, 2024
8b6804e
format
DarkLight1337 Nov 26, 2024
26e3fdf
Fix typo
DarkLight1337 Nov 26, 2024
93d27bc
Enable the test to pass on V1
DarkLight1337 Nov 26, 2024
d697241
Handle embedding inputs
DarkLight1337 Nov 26, 2024
ca11cc9
format
DarkLight1337 Nov 26, 2024
c32cba9
Merge branch 'main' into llava-mm-processor
DarkLight1337 Nov 27, 2024
6c5c9ca
Fix wrong ndim
DarkLight1337 Nov 27, 2024
0194324
Factor out `merge_placeholders`
DarkLight1337 Nov 27, 2024
09618d0
Fix placeholder maps handling on V0
DarkLight1337 Nov 27, 2024
5501458
Remove unused dummy data code
DarkLight1337 Nov 27, 2024
f3673c7
Update dummy model
DarkLight1337 Nov 27, 2024
37bc008
Enable overriding hf processor and tokenizer; fix `_apply_prompt_repl…
DarkLight1337 Nov 27, 2024
4805a9e
Improve error handling in `_resolve_matches`; merge matches directly
DarkLight1337 Nov 27, 2024
8539008
Avoid hashing
DarkLight1337 Nov 27, 2024
00244c7
Update mapper tests
DarkLight1337 Nov 27, 2024
a00f541
Merge branch 'main' into llava-mm-processor
DarkLight1337 Dec 4, 2024
b31f8d4
Avoid calling input mapper in the first place
DarkLight1337 Dec 4, 2024
711cd38
Fix missing `multi_modal_kwargs` in dummy data
DarkLight1337 Dec 5, 2024
a11c6b2
Update dummy model
DarkLight1337 Dec 5, 2024
1d5a4d4
proper processing
ywang96 Dec 6, 2024
000736b
Patch pixtral processor
DarkLight1337 Dec 6, 2024
1485c05
Fix double counting of `mm_counts`
DarkLight1337 Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Avoid calling input mapper in the first place
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • Loading branch information
DarkLight1337 committed Dec 4, 2024
commit b31f8d4d6e7b307c1f63ce1bf634c5b4dbd34258
4 changes: 0 additions & 4 deletions vllm/v1/engine/mm_input_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,6 @@ def process_inputs(
mm_data: MultiModalDataDict,
mm_processor_kwargs: Optional[Dict[str, Any]],
) -> List[MultiModalKwargs]:
# Skip this redundant step if merged processor has been applied
if isinstance(mm_data, MultiModalKwargs):
return [mm_data]

image_inputs = mm_data["image"]
if not isinstance(image_inputs, list):
image_inputs = [image_inputs]
Expand Down
16 changes: 11 additions & 5 deletions vllm/v1/engine/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
from vllm.inputs.parse import is_encoder_decoder_inputs
from vllm.inputs.preprocess import InputPreprocessor
from vllm.lora.request import LoRARequest
from vllm.multimodal import MULTIMODAL_REGISTRY, MultiModalRegistry
from vllm.multimodal import (MULTIMODAL_REGISTRY, MultiModalKwargs,
MultiModalRegistry)
from vllm.pooling_params import PoolingParams
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sampling_params import SamplingParams
Expand Down Expand Up @@ -101,10 +102,15 @@ def process_inputs(
self.generation_config_fields, eos_token_id)

# Preprocess multi-modal data
mm_inputs = self.mm_input_mapper.process_inputs(
decoder_inputs.multi_modal_data,
decoder_inputs.mm_processor_kwargs) if len(
decoder_inputs.multi_modal_data) > 0 else None
if len(decoder_inputs.multi_modal_data) == 0:
mm_inputs = None
elif isinstance(decoder_inputs.multi_modal_data, MultiModalKwargs):
mm_inputs = [decoder_inputs.multi_modal_data]
else:
mm_inputs = self.mm_input_mapper.process_inputs(
decoder_inputs.multi_modal_data,
decoder_inputs.mm_processor_kwargs,
)

# Make Request for Detokenizer.
detokenizer_request = DetokenizerRequest(
Expand Down
Loading