-
Notifications
You must be signed in to change notification settings - Fork 10
Update vLLM to 0.10.1 #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fb1e1a3 to
b985220
Compare
nilai-models/pyproject.toml
Outdated
| dependencies = [ | ||
| "httpx>=0.27.2", | ||
| "nilai-common", | ||
| "vllm>=0.10.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? vLLM should be installed in the Docker container, as we're using vllm-openai as base image right? vLLM is a heavy dependency, if not needed, I would remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, I've removed it
tests/e2e/test_openai.py
Outdated
| # Convert to dict to access the data safely | ||
| first_call_dict = first_call.model_dump() | ||
|
|
||
| # Extract function name and arguments from the tool call | ||
| if "function" in first_call_dict: | ||
| function_name = first_call_dict["function"]["name"] | ||
| function_args = first_call_dict["function"]["arguments"] | ||
| else: | ||
| # Fallback for different structure | ||
| function_name = first_call_dict.get("name", "") | ||
| function_args = first_call_dict.get("arguments", "") | ||
|
|
||
| assert function_name == "get_weather", "Function name should be get_weather" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity. Why is this change necessary? Is there any fundamental change in how vLLM behaves now that makes this not work any longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM ≥0.10 aligned its tool-calling objects with OpenAI’s tool_calls[].function={name,arguments} schema. In 0.7.x the same info was exposed flat as {name, arguments} on the tool call
I have removed the condition as we only use the vLLM ≥0.10
jcabrero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check https://docs.vllm.ai/en/latest/configuration/engine_args.html#-max-num-batched-tokens.
- It is wrongly specified. You write
max_num_batched_tokensand the parameter supported by vLLM ismax-num-batched-tokens. Thus, making no effective change. - I am not sure we need it, as by default it is None and set to the usage context (i.e., max-model-len).
- I understand Llama 1B GPU file where no
max-model-lenis specified to have this parameter.
jcabrero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job 🥇
I think once you merge one of the two PRs it's going to conflict with the changes to docker/api.Dockerfile, but should be just about dropping the changes in the second PR to be merged.
No description provided.