Skip to content

Conversation

@blefo
Copy link
Member

@blefo blefo commented Aug 21, 2025

No description provided.

@blefo blefo force-pushed the feat-update-vllm-0.10.1 branch from fb1e1a3 to b985220 Compare August 21, 2025 11:01
@blefo blefo requested a review from jcabrero August 21, 2025 11:27
dependencies = [
"httpx>=0.27.2",
"nilai-common",
"vllm>=0.10.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? vLLM should be installed in the Docker container, as we're using vllm-openai as base image right? vLLM is a heavy dependency, if not needed, I would remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, I've removed it

Comment on lines 333 to 345
# Convert to dict to access the data safely
first_call_dict = first_call.model_dump()

# Extract function name and arguments from the tool call
if "function" in first_call_dict:
function_name = first_call_dict["function"]["name"]
function_args = first_call_dict["function"]["arguments"]
else:
# Fallback for different structure
function_name = first_call_dict.get("name", "")
function_args = first_call_dict.get("arguments", "")

assert function_name == "get_weather", "Function name should be get_weather"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity. Why is this change necessary? Is there any fundamental change in how vLLM behaves now that makes this not work any longer?

Copy link
Member Author

@blefo blefo Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM ≥0.10 aligned its tool-calling objects with OpenAI’s tool_calls[].function={name,arguments} schema. In 0.7.x the same info was exposed flat as {name, arguments} on the tool call

I have removed the condition as we only use the vLLM ≥0.10

Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check https://docs.vllm.ai/en/latest/configuration/engine_args.html#-max-num-batched-tokens.

  1. It is wrongly specified. You write max_num_batched_tokens and the parameter supported by vLLM is max-num-batched-tokens. Thus, making no effective change.
  2. I am not sure we need it, as by default it is None and set to the usage context (i.e., max-model-len).
  3. I understand Llama 1B GPU file where no max-model-len is specified to have this parameter.

@blefo blefo requested a review from jcabrero August 22, 2025 11:06
Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 🥇

I think once you merge one of the two PRs it's going to conflict with the changes to docker/api.Dockerfile, but should be just about dropping the changes in the second PR to be merged.

@blefo blefo merged commit f8ff93d into main Aug 22, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants