Add Phi4 multimodal #36939

Cyrilvallez · 2025-03-24T17:00:42Z

What does this PR do?

github-actions · 2025-03-24T17:00:53Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers.

ArthurZucker

Already approved on the fork we have! 🤗

HuggingFaceDocBuilderDev · 2025-03-24T18:40:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* raw start * update * update * add to imports * update * up * simplify configs * clean configs * style * typos * Update convert_phi4_multimodal_weights_to_hf.py * Update convert_phi4_multimodal_weights_to_hf.py * fix * up * up * up * Update convert_phi4_multimodal_weights_to_hf.py * Update convert_phi4_multimodal_weights_to_hf.py * up * up * up * Update feature_extraction_phi4_multimodal.py * up * up * up * up * up * simplify configs * typo * cut code * typo * typo * typo * re * typo * up * up * up * add tests * fix * fix * Update test_modeling_phi4_multimodal.py * up * Update test_modeling_phi4_multimodal.py * doc * fix * up * up * up * up * up * up * simplify * up * simplify * config docstrings * cleanup * clean * typo * typo * fix * Update phi4_multimodal.md * fix * fix * Update test_modeling_phi4_multimodal.py * update * simplify reshapes and permutes * up * simplify special tokens * simplify processor a lot * Update processing_phi4_multimodal.py * Update processing_phi4_multimodal.py * switch to fast processor * image processor * Update image_processing_phi4_multimodal_fast.py * add lora extraction to converter * Update convert_phi4_multimodal_weights_to_hf.py * Update __init__.py * add AudioInput type in audio_utils * rewrite feature_extraction: support torch batched FFT * input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values * test update * not mono channel warning update * remove auto maps from processor * kargs dispatch in processor * simplify kwargs dispatch * simplify merging * remove default sampling rate * style * Update test_modeling_phi4_multimodal.py * update doc * doc * torch only feature extractor * make fake tokens adjustable * Update feature_extraction_phi4_multimodal.py * fix * Update processing_phi4_multimodal.py * simplify mask * last touch * fix copies * style * Update audio_utils.py * style * Update feature_extraction_phi4_multimodal.py * Update __init__.py * docstrings * copies * fix all checks * back to fix-copies * trigger CIs * Update feature_extraction_phi4_multimodal.py * improve tests with multimodal inputs * trigger CIs --------- Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>

github-actions bot marked this pull request as draft March 24, 2025 17:00

Cyrilvallez marked this pull request as ready for review March 24, 2025 17:01

github-actions bot requested review from ArthurZucker and Rocketknight1 March 24, 2025 17:01

ArthurZucker approved these changes Mar 24, 2025

View reviewed changes

Cyrilvallez added 24 commits March 24, 2025 18:05

raw start

00bcfd4

update

aef5f66

update

60595b3

add to imports

ddfe10a

update

88f473e

up

5012749

simplify configs

bc1d197

clean configs

e56e7b0

style

8d35ac9

typos

f490482

Update convert_phi4_multimodal_weights_to_hf.py

c0e1da4

Update convert_phi4_multimodal_weights_to_hf.py

c435c22

fix

98b393c

up

0bd29a3

up

52bf0e8

up

a37b084

Update convert_phi4_multimodal_weights_to_hf.py

ce4735b

Update convert_phi4_multimodal_weights_to_hf.py

fe9fed1

up

5fffe53

up

c102b46

up

dbbad21

Update feature_extraction_phi4_multimodal.py

67cad7f

up

cc4cd0e

up

da8b0aa

Cyrilvallez added 10 commits March 24, 2025 18:09

update doc

37b3dbe

doc

bc6d6a5

torch only feature extractor

b241377

make fake tokens adjustable

9c752b2

Update feature_extraction_phi4_multimodal.py

47664e1

fix

d9beef2

Update processing_phi4_multimodal.py

17985f9

simplify mask

c169f36

last touch

067edbf

fix copies

9bee9f3

Cyrilvallez force-pushed the phi4 branch from 1ed11d6 to 9bee9f3 Compare March 24, 2025 17:09

Cyrilvallez added 9 commits March 24, 2025 18:11

style

653b8ec

Update audio_utils.py

4213e97

style

2439003

Update feature_extraction_phi4_multimodal.py

16f5ca8

Update __init__.py

5b773c8

docstrings

a70f307

copies

ac699b1

fix all checks

aa6664b

back to fix-copies

c3a1a89

Cyrilvallez added 4 commits March 24, 2025 20:39

trigger CIs

095bb8a

Update feature_extraction_phi4_multimodal.py

bdc8e38

improve tests with multimodal inputs

4f52195

trigger CIs

ec726d7

Cyrilvallez merged commit 4303d88 into main Mar 25, 2025
24 checks passed

Cyrilvallez deleted the phi4 branch March 25, 2025 08:55

Isotr0py mentioned this pull request Apr 24, 2025

[VLM] Support HF format Phi-4-MM model vllm-project/vllm#17121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Phi4 multimodal #36939

Add Phi4 multimodal #36939

Uh oh!

Cyrilvallez commented Mar 24, 2025

Uh oh!

github-actions bot commented Mar 24, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add Phi4 multimodal #36939

Add Phi4 multimodal #36939

Uh oh!

Conversation

Cyrilvallez commented Mar 24, 2025

What does this PR do?

Uh oh!

github-actions bot commented Mar 24, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants