[VLM] Merged multi-modal processor for Pixtral #12211

Flechman · 2025-01-20T09:25:12Z

This PR aims at implementing the merged multi-modal processor for Pixtral as an effort to contribute to the V1 re-arch for multi-modal models.

Additional changes (by @DarkLight1337 ):

SImplify mask construction for Pixtral-HF
Update type annotation for flatten_2d_lists to avoid unnecessary iteration

Signed-off-by: remi <remi@mistral.ai>

github-actions · 2025-01-20T09:25:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/model_executor/models/pixtral.py

Signed-off-by: remi <remi@mistral.ai>

DarkLight1337 · 2025-02-05T08:11:18Z

#12767 should make it easier to pass the image token ID

mergify · 2025-02-13T03:53:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: remi <remi@mistral.ai>

DarkLight1337 · 2025-03-08T06:16:32Z

Any update on this?

mergify · 2025-03-08T06:21:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: remi <remi@mistral.ai>

DarkLight1337 · 2025-03-14T08:39:50Z

Evaluation against mmmu_val using lmms_eval:

[PR #14806]
V0:
- mistralai/Pixtral-12B-2409: 0.5044
- mistral-community/pixtral-12b: 0.5056

[PR #12211]
V0:
- mistralai/Pixtral-12B-2409: 0.5044
- mistral-community/pixtral-12b: 0.5056

I'm unable to run the eval on V1 because of CUDA re-initialization error. @youkaichao do you have any idea why this happens?

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ywang96 · 2025-03-14T17:59:10Z

@DarkLight1337 It's probably related to spawn. I will eval it with mistral-evals.

ywang96 · 2025-03-14T19:38:30Z

Running into issues on V1 - I'll debug

 ValueError: Attempted to assign 0 + 0 = 0 multimodal tokens to 2 placeholders

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ywang96

python -m eval.run eval_vllm \                                      
        --model_name mistralai/Pixtral-12B-2409 \
        --url http://0.0.0.0:8000/ \
        --output_dir ~/tmp \
        --eval_name docvqa

V0 main

{
    "anls": 0.8834957820937713
}

V1 main

{
    "anls": 0.8837985481702968
}

V0 this PR

{
    "anls": 0.883626647676123
}

V1 this PR (by overriding in arg_utils)

{
    "anls": 0.8837702579252248
}

Given the results are all close enough, this PR should be good to go! Thanks for the work!

mergify · 2025-03-15T09:52:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-03-15T09:57:24Z

python -m eval.run eval_vllm \
    --model_name mistral-community/pixtral-12b \
    --url http://0.0.0.0:8000 \
    --output_dir ~/tmp \
    --eval_name docvqa

V0 main

{
    "anls": 0.8941889853837489
}

V1 main

{
    "anls": 0.8943141800499299
}

V0 this PR

{
    "anls": 0.8938789942826822
}

V1 this PR

{
    "anls": 0.8938797419262041
}

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: remi <remi@mistral.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: remi <remi@mistral.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: remi <remi@mistral.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Adjustment first version

fbe6a9d

Signed-off-by: remi <remi@mistral.ai>

This was referenced Jan 20, 2025

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Closed

mgoin reviewed Jan 20, 2025

View reviewed changes

vllm/model_executor/models/pixtral.py Outdated Show resolved Hide resolved

ywang96 assigned ywang96 and DarkLight1337 Jan 20, 2025

Flechman added 2 commits January 22, 2025 14:59

Merge with main

46c142f

Revert changes

4af1716

Signed-off-by: remi <remi@mistral.ai>

Flechman force-pushed the pixtral-mm-processor branch from 41c423a to 4af1716 Compare January 26, 2025 12:19

Flechman added 5 commits January 26, 2025 12:32

Add pixtral dummy inputs builder

8a75f3a

Signed-off-by: remi <remi@mistral.ai>

Fix naming

2e346d3

Signed-off-by: remi <remi@mistral.ai>

HF processor not supported

c9c082b

Signed-off-by: remi <remi@mistral.ai>

Add tokenizer mode

869a620

Signed-off-by: remi <remi@mistral.ai>

Override pixtral processor apply

a6392cb

Signed-off-by: remi <remi@mistral.ai>

mergify bot added the needs-rebase label Feb 13, 2025

Merge with main

c1b78f4

Signed-off-by: remi <remi@mistral.ai>

mergify bot removed the needs-rebase label Feb 14, 2025

mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 8, 2025

mergify bot added the needs-rebase label Mar 8, 2025

Merge with main

9d70fba

Signed-off-by: remi <remi@mistral.ai>

mergify bot removed the needs-rebase label Mar 9, 2025

Flechman added 2 commits March 9, 2025 21:57

Add caching mechanism

cafe731

Signed-off-by: remi <remi@mistral.ai>

Add tokenization

4c8f915

Signed-off-by: remi <remi@mistral.ai>

Flechman marked this pull request as ready for review March 9, 2025 22:51

Cleanup previous processor

c1bef45

Signed-off-by: remi <remi@mistral.ai>

Merge branch 'main' into pixtral-mm-processor

025a237

DarkLight1337 mentioned this pull request Mar 14, 2025

[VLM] Various cleanup and fixes #14806

Merged

Merge branch 'main' into pixtral-mm-processor

8566c27

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added 8 commits March 15, 2025 05:45

Try fix V1

083bc2f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Update

7a72365

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Update

d216e16

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Rename and simplify

465541a

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Optimize

e653095

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

3ec39ff

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Merge branch 'main' into pixtral-mm-processor

68eb2d6

Fix V0 inference

1373b39

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ywang96 approved these changes Mar 15, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 15, 2025

mergify bot added the needs-rebase label Mar 15, 2025

Merge branch 'main' into pixtral-mm-processor

b12f969

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 enabled auto-merge (squash) March 15, 2025 09:54

mergify bot removed the needs-rebase label Mar 15, 2025

Fix

8186f96

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

vllm-bot merged commit 61c6a5a into vllm-project:main Mar 15, 2025
8 of 12 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[VLM] Merged multi-modal processor for Pixtral #12211

[VLM] Merged multi-modal processor for Pixtral #12211

Uh oh!

Flechman commented Jan 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 20, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Feb 5, 2025

Uh oh!

mergify bot commented Feb 13, 2025

Uh oh!

DarkLight1337 commented Mar 8, 2025

Uh oh!

mergify bot commented Mar 8, 2025

Uh oh!

DarkLight1337 commented Mar 14, 2025

Uh oh!

ywang96 commented Mar 14, 2025

Uh oh!

ywang96 commented Mar 14, 2025

Uh oh!

ywang96 left a comment •

edited by DarkLight1337

Loading

Uh oh!

mergify bot commented Mar 15, 2025

Uh oh!

DarkLight1337 commented Mar 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[VLM] Merged multi-modal processor for Pixtral #12211

[VLM] Merged multi-modal processor for Pixtral #12211

Uh oh!

Conversation

Flechman commented Jan 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 20, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Feb 5, 2025

Uh oh!

mergify bot commented Feb 13, 2025

Uh oh!

DarkLight1337 commented Mar 8, 2025

Uh oh!

mergify bot commented Mar 8, 2025

Uh oh!

DarkLight1337 commented Mar 14, 2025

Uh oh!

ywang96 commented Mar 14, 2025

Uh oh!

ywang96 commented Mar 14, 2025

Uh oh!

ywang96 left a comment • edited by DarkLight1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 15, 2025

Uh oh!

DarkLight1337 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Flechman commented Jan 20, 2025 •

edited by github-actions bot

Loading

ywang96 left a comment •

edited by DarkLight1337

Loading

DarkLight1337 commented Mar 15, 2025 •

edited

Loading