Skip to content

[Model] New model support for Phi-4-multimodal-instruct #14119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1da44c5
Add support for Phi-4-multimodal-instruct
congcongchen123 Feb 5, 2025
d0fa70f
Fix error related to interface changes from latest main
congcongchen123 Feb 6, 2025
85ef077
rename and clean up code
congcongchen123 Feb 7, 2025
d8f40d8
Minor clean-up
jrplatin Feb 7, 2025
a569e4e
Do not support Tensor Parallel and Pipeline Parallel
congcongchen123 Feb 7, 2025
f7b8579
clean up code
congcongchen123 Feb 7, 2025
375e246
code cleaning / renaming for the vision part
ChenRocks Feb 7, 2025
f8a3373
rename phi4o with phi4mm
congcongchen123 Feb 8, 2025
7b73371
refactor change phi4o to phi4mm
congcongchen123 Feb 8, 2025
30a70d5
refactor change phi4o to phi4mm continued
congcongchen123 Feb 8, 2025
99a636d
final update to change phio to phi4mm
congcongchen123 Feb 8, 2025
5dcf783
Fix errors after rebasing to the top of the main
congcongchen123 Feb 25, 2025
707dfe1
Refactor phimm_utils
vmazalov Feb 26, 2025
40011bb
remove flash_attn from requirements-common.txt
congcongchen123 Mar 3, 2025
1200132
Add more max LoRA rank support
congcongchen123 Mar 3, 2025
4b11f10
format code
congcongchen123 Mar 3, 2025
86295b3
restore requirements-test.txt
congcongchen123 Mar 3, 2025
bbdcfb7
remove hard dependency on flash-attn
congcongchen123 Mar 4, 2025
28de545
Register test and add model info to supported_model.md
congcongchen123 Mar 4, 2025
5327e89
restore requirements-test.txt
congcongchen123 Mar 4, 2025
454d4a7
restore requirements-test.txt
congcongchen123 Mar 4, 2025
1bb5750
Add text-only version of Phi-4-mini to the supported_models page per…
congcongchen123 Mar 4, 2025
a2cc774
Minor update to supported_models.md
congcongchen123 Mar 4, 2025
0460912
Update supported_models.md
congcongchen123 Mar 4, 2025
08c845a
delete the testing script
congcongchen123 Mar 4, 2025
1f468f4
Merge branch 'main' into congcongchen/phi-4-multimodal-instruct
congcongchen123 Mar 4, 2025
72ddbb4
Print errors instead of throwing the RuntimeError since CI environmen…
congcongchen123 Mar 4, 2025
77f4edc
update attn_backend detection
ywang96 Mar 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion docs/source/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ See [this page](#generative-models) for more information on how to use generativ
* ✅︎
- * `Phi3ForCausalLM`
* Phi-4, Phi-3
* `microsoft/Phi-4`, `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc.
* `microsoft/Phi-4-mini-instruct`, `microsoft/Phi-4`, `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc.
* ✅︎
* ✅︎
- * `Phi3SmallForCausalLM`
Expand Down Expand Up @@ -856,6 +856,13 @@ See [this page](#generative-models) for more information on how to use generativ
*
* ✅︎
* ✅︎
- * `Phi4MMForCausalLM`
* Phi-4-multimodal
* T + I<sup>+</sup> / T + A<sup>+</sup> / I<sup>+</sup> + A<sup>+</sup>
* `microsoft/Phi-4-multimodal-instruct`, etc.
* ✅︎
*
*
- * `PixtralForConditionalGeneration`
* Pixtral
* T + I<sup>+</sup>
Expand Down
1 change: 1 addition & 0 deletions requirements-common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ depyf==0.18.0 # required for profiling and debugging with compilation config
cloudpickle # allows pickling lambda functions in model_executor/models/registry.py
watchfiles # required for http server to monitor the updates of TLS files
python-json-logger # Used by logging as per examples/other/logging_configuration.md
scipy # Required for phi-4-multimodal-instruct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to remove the scipy dep if possible. It is okay if we do it in a follow up PR. Seems like it is only being used for a single function scipy.signal.resample_poly in phi4mm.py

        # Resample to 16000 or 8000 if needed
        if fs > 16000:
            wav = scipy.signal.resample_poly(wav, 1, fs // 16000)
            fs = 16000
        elif 8000 < fs < 16000:
            wav = scipy.signal.resample_poly(wav, 1, fs // 8000)
            fs = 8000
        elif fs < 8000:
            raise RuntimeError(f"Unsupported sample rate {fs}")

        if fs == 8000:
            if self._eightk_method == "resample":
                # Input audio is 8 kHz. Convert to 16 kHz before feature
                # extraction
                wav = scipy.signal.resample_poly(wav, 2, 1)
                fs = 16000
            # Do nothing here for fillzero method
        elif fs != 16000:
            # Input audio is not a supported sample rate.
            raise RuntimeError(
                f"Input data using an unsupported sample rate: {fs}"
            )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do that in the following up PR.

2 changes: 2 additions & 0 deletions tests/models/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,8 @@ def check_available_online(
extras={"v2": "google/paligemma2-3b-ft-docci-448"}), # noqa: E501
"Phi3VForCausalLM": _HfExamplesInfo("microsoft/Phi-3-vision-128k-instruct",
trust_remote_code=True),
"Phi4MMForCausalLM": _HfExamplesInfo("microsoft/Phi-4-multimodal-instruct",
trust_remote_code=True),
"PixtralForConditionalGeneration": _HfExamplesInfo("mistralai/Pixtral-12B-2409", # noqa: E501
tokenizer_mode="mistral"),
"QwenVLForConditionalGeneration": _HfExamplesInfo("Qwen/Qwen-VL",
Expand Down
4 changes: 2 additions & 2 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2284,9 +2284,9 @@ def compute_hash(self) -> str:
return hash_str

def __post_init__(self):
# Setting the maximum rank to 256 should be able to satisfy the vast
# Setting the maximum rank to 512 should be able to satisfy the vast
# majority of applications.
possible_max_ranks = (8, 16, 32, 64, 128, 256)
possible_max_ranks = (8, 16, 32, 64, 128, 256, 320, 512)
possible_lora_extra_vocab_size = (0, 256, 512)
if self.max_lora_rank not in possible_max_ranks:
raise ValueError(
Expand Down
4 changes: 4 additions & 0 deletions vllm/entrypoints/chat_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,8 @@ def _placeholder_str(self, modality: ModalityStr,
if model_type == "phi3_v":
# Workaround since this token is not defined in the tokenizer
return f"<|image_{current_count}|>"
if model_type == "phi4mm":
return "<|endoftext10|>" # 200010 (see vocab.json in hf model)
if model_type in ("minicpmo", "minicpmv"):
return "(<image>./</image>)"
if model_type in ("blip-2", "chatglm", "fuyu", "paligemma",
Expand Down Expand Up @@ -424,6 +426,8 @@ def _placeholder_str(self, modality: ModalityStr,
elif modality == "audio":
if model_type == "ultravox":
return "<|audio|>"
if model_type == "phi4mm":
return "<|endoftext11|>" # 200011 (see vocab.json in hf model)
if model_type == "qwen2_audio":
return (f"Audio {current_count}: "
f"<|audio_bos|><|AUDIO|><|audio_eos|>")
Expand Down
Loading