Skip to content

[VLM] Merged multi-modal processor for Pixtral #12211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Mar 15, 2025

Conversation

Flechman
Copy link
Contributor

@Flechman Flechman commented Jan 20, 2025

This PR aims at implementing the merged multi-modal processor for Pixtral as an effort to contribute to the V1 re-arch for multi-modal models.

Additional changes (by @DarkLight1337 ):

  • SImplify mask construction for Pixtral-HF
  • Update type annotation for flatten_2d_lists to avoid unnecessary iteration

Signed-off-by: remi <remi@mistral.ai>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: remi <remi@mistral.ai>
@Flechman Flechman force-pushed the pixtral-mm-processor branch from 41c423a to 4af1716 Compare January 26, 2025 12:19
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
@DarkLight1337
Copy link
Member

#12767 should make it easier to pass the image token ID

Copy link

mergify bot commented Feb 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 13, 2025
Signed-off-by: remi <remi@mistral.ai>
@mergify mergify bot removed the needs-rebase label Feb 14, 2025
@DarkLight1337
Copy link
Member

Any update on this?

@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 8, 2025
Copy link

mergify bot commented Mar 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 8, 2025
Signed-off-by: remi <remi@mistral.ai>
@mergify mergify bot removed the needs-rebase label Mar 9, 2025
Flechman added 2 commits March 9, 2025 21:57
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
@Flechman Flechman marked this pull request as ready for review March 9, 2025 22:51
Signed-off-by: remi <remi@mistral.ai>
@DarkLight1337
Copy link
Member

Evaluation against mmmu_val using lmms_eval:

[PR #14806]
V0:
- mistralai/Pixtral-12B-2409: 0.5044
- mistral-community/pixtral-12b: 0.5056

[PR #12211]
V0:
- mistralai/Pixtral-12B-2409: 0.5044
- mistral-community/pixtral-12b: 0.5056

I'm unable to run the eval on V1 because of CUDA re-initialization error. @youkaichao do you have any idea why this happens?

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@ywang96
Copy link
Member

ywang96 commented Mar 14, 2025

@DarkLight1337 It's probably related to spawn. I will eval it with mistral-evals.

@ywang96
Copy link
Member

ywang96 commented Mar 14, 2025

Running into issues on V1 - I'll debug

 ValueError: Attempted to assign 0 + 0 = 0 multimodal tokens to 2 placeholders

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python -m eval.run eval_vllm \                                      
        --model_name mistralai/Pixtral-12B-2409 \
        --url http://0.0.0.0:8000/ \
        --output_dir ~/tmp \
        --eval_name docvqa

V0 main

{
    "anls": 0.8834957820937713
}

V1 main

{
    "anls": 0.8837985481702968
}

V0 this PR

{
    "anls": 0.883626647676123
}

V1 this PR (by overriding in arg_utils)

{
    "anls": 0.8837702579252248
}

Given the results are all close enough, this PR should be good to go! Thanks for the work!

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 15, 2025
Copy link

mergify bot commented Mar 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Flechman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 15, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 15, 2025 09:54
@mergify mergify bot removed the needs-rebase label Mar 15, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented Mar 15, 2025

python -m eval.run eval_vllm \
    --model_name mistral-community/pixtral-12b \
    --url http://0.0.0.0:8000 \
    --output_dir ~/tmp \
    --eval_name docvqa

V0 main

{
    "anls": 0.8941889853837489
}

V1 main

{
    "anls": 0.8943141800499299
}

V0 this PR

{
    "anls": 0.8938789942826822
}

V1 this PR

{
    "anls": 0.8938797419262041
}

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@vllm-bot vllm-bot merged commit 61c6a5a into vllm-project:main Mar 15, 2025
8 of 12 checks passed
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants