-
-
Notifications
You must be signed in to change notification settings - Fork 10k
[VLM] Disallow overflowing max_model_len
for multimodal models
#7998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM] Disallow overflowing max_model_len
for multimodal models
#7998
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a nit but otherwise LGTM!
"number of text tokens plus multimodal tokens. For image " | ||
"inputs, the number of image tokens depends on the number " | ||
"of images, and possibly their aspect ratios as well.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can probably just generalize the wording here for all multimodal data items instead of calling out image in particular, but we can do this change in a later PR.
…m-project#7998) Signed-off-by: Alvant <alvasian@yandex.ru>
…m-project#7998) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
Currently, the placeholder tokens are silently truncated if they exceed the context length, causing confusing errors when later assigning multimodal features to the placeholder tokens inside the model. (e.g. #6176)
This PR avoids such problems by checking the length of the processed prompt beforehand.