Skip to content

Add DeepSeek-R1-0528 function call chat template #18874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Xu-Wenqing
Copy link
Contributor

@Xu-Wenqing Xu-Wenqing commented May 29, 2025

DeepSeek-R1-0528 model support function call, add function call chat template.

Usage:

vllm serve ... --enable-auto-tool-choice --tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja

Function Call Test
Use Berkeley Function Calling Leaderboard to evaluate function call template here.

Evaluation Result:
🦍 Model: DeepSeek-R1-0528
🔍 Running test: simple
✅ Test completed: simple. 🎯 Accuracy: 0.9325
Number of models evaluated: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 41.24it/s]
📈 Aggregating data to generate leaderboard score table...
🏁 Evaluation completed. See /Users/xuwenqing/function_call_eval/score/data_overall.csv for overall evaluation results on BFCL V3.

Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
@Xu-Wenqing Xu-Wenqing changed the title Add DeepSeekR1-0528 function call chat template Add DeepSeek-R1-0528 function call chat template May 29, 2025
@mergify mergify bot added documentation Improvements or additions to documentation tool-calling labels May 29, 2025
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@Xu-Wenqing Xu-Wenqing marked this pull request as ready for review May 29, 2025 04:57
@Xu-Wenqing
Copy link
Contributor Author

@aarnphm @DarkLight1337

Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
@Xu-Wenqing Xu-Wenqing requested a review from hmellor as a code owner May 29, 2025 05:06
@Xu-Wenqing
Copy link
Contributor Author

#18931

Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
@markluofd
Copy link

I use the following command to start server

vllm serve /model     --tensor-parallel-size 8     --pipeline-parallel-size 2     --trust-remote-code     --gpu-memory-utilization 0.92     --enable-auto-tool-choice     --tool-call-parser deepseek_v3                     --chat-template /home/work/easyedge/llm/tool_chat_template_deepseekr1.jinja     --max-model-len 98304     --host 0.0.0.0     --port 8669     --served-model-name DeepSeek-R1     --uvicorn-log-level info

and curl the server with the body

{
    "messages": [
        {
            "content": "",
            "role": "system"
        },
        {
            "content": "what's the magic function of 5?",
            "role": "user"
        }
    ],
    "model": "",
    "max_tokens": 2560,
    "stream": false,
    "temperature": 0.7,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "magic_function",
                "description": "Applies a magic function to an input.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "input": {
                            "type": "integer"
                        }
                    },
                    "required": [
                        "input"
                    ]
                }
            }
        }
    ]
}

but I got the result as follows:

{
    "id": "chatcmpl-83f63afeb0644f1990e2c865cd08f7f2",
    "object": "chat.completion",
    "created": 1748572697,
    "model": "DeepSeek-R1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "reasoning_content": null,
                "content": "<think>\nWe are given a function called \"magic_function\" that takes an integer input.\n The user query is: \"what's the magic function of 5?\"\n We should call the magic_function with input 5.\n We will output the function call in the required format.\n</think>\n",
                "tool_calls": []
            },
            "logprobs": null,
            "finish_reason": "tool_calls",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 160,
        "total_tokens": 238,
        "completion_tokens": 78,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}

it seems like tool parser failed to contract the tool_calls parameters, have I use the wrong command ?

@Xu-Wenqing
Copy link
Contributor Author

I use the following command to start server

vllm serve /model     --tensor-parallel-size 8     --pipeline-parallel-size 2     --trust-remote-code     --gpu-memory-utilization 0.92     --enable-auto-tool-choice     --tool-call-parser deepseek_v3                     --chat-template /home/work/easyedge/llm/tool_chat_template_deepseekr1.jinja     --max-model-len 98304     --host 0.0.0.0     --port 8669     --served-model-name DeepSeek-R1     --uvicorn-log-level info

and curl the server with the body

{
    "messages": [
        {
            "content": "",
            "role": "system"
        },
        {
            "content": "what's the magic function of 5?",
            "role": "user"
        }
    ],
    "model": "",
    "max_tokens": 2560,
    "stream": false,
    "temperature": 0.7,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "magic_function",
                "description": "Applies a magic function to an input.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "input": {
                            "type": "integer"
                        }
                    },
                    "required": [
                        "input"
                    ]
                }
            }
        }
    ]
}

but I got the result as follows:

{
    "id": "chatcmpl-83f63afeb0644f1990e2c865cd08f7f2",
    "object": "chat.completion",
    "created": 1748572697,
    "model": "DeepSeek-R1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "reasoning_content": null,
                "content": "<think>\nWe are given a function called \"magic_function\" that takes an integer input.\n The user query is: \"what's the magic function of 5?\"\n We should call the magic_function with input 5.\n We will output the function call in the required format.\n</think>\n",
                "tool_calls": []
            },
            "logprobs": null,
            "finish_reason": "tool_calls",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 160,
        "total_tokens": 238,
        "completion_tokens": 78,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}

it seems like tool parser failed to contract the tool_calls parameters, have I use the wrong command ?

@markluofd you can add tool_choice="required" in your request.

@DarkLight1337 DarkLight1337 requested a review from aarnphm May 30, 2025 06:13
@markluofd
Copy link

failed too, seems like the response is not a json format😂
request as:

{
    "messages": [
        {
            "content": "",
            "role": "system"
        },
        {
            "content": "what's the magic function of 5?",
            "role": "user"
        }
    ],
    "model": "",
    "max_tokens": 2560,
    "stream": false,
    "temperature": 0.7,
    "tool_choice": "required",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "magic_function",
                "description": "Applies a magic function to an input.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "input": {
                            "type": "integer"
                        }
                    },
                    "required": [
                        "input"
                    ]
                }
            }
        }
    ]
}

response as:

{
    "object": "error",
    "message": "1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 1 column 3 [type=json_invalid, input_value='[{\"', input_type=str]\n    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid",
    "type": "BadRequestError",
    "param": null,
    "code": 400
}

I extract the prompt from vllm log as:

Received request chatcmpl-95a8d630d8f64a8cad7548c5057c59d5: prompt: '<|begin▁of▁sentence|>\n    You may call one or more functions to assist with the user query.\n\n    Here are the available functions:\n{"type": "function", "function": {"name": "magic_function", "description": "Applies a magic function to an input.", "parameters": {"type": "object", "properties": {"input": {"type": "integer"}}, "required": ["input"]}}}\n    For function call returns, you should first print <|tool▁calls▁begin|>\n    For each function call, you should return object like:\n\n    <|tool▁call▁begin|>function<|tool▁sep|><function_name>\n```json\n<function_arguments_in_json_format>\n```<|tool▁call▁end|>\n    At the end of function call returns, you should print <|tool▁calls▁end|><|end▁of▁sentence|>\n<|User|>what\'s the magic function of 5?    <|Assistant|>\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=0.95, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2560, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'type': 'array', 'minItems': 1, 'items': {'type': 'object', 'anyOf': [{'properties': {'name': {'type': 'string', 'enum': ['magic_function']}, 'parameters': {'type': 'object', 'properties': {'input': {'type': 'integer'}}, 'required': ['input']}}, 'required': ['name', 'parameters']}]}}, regex=None, choice=None, grammar=None, json_object=None, backend=None, backend_was_auto=False, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, whitespace_pattern=None, structural_tag=None), extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None, prompt_adapter_request: None

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, thanks

@github-project-automation github-project-automation bot moved this from Backlog to In progress in DeepSeek V3/R1 May 30, 2025
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 30, 2025
@NaiveYan
Copy link

NaiveYan commented May 31, 2025

I'm encountering an issue when trying to use the DeepSeek-R1-0528-Qwen3-8B model. It appears unsupported, returning Error 400:

{'object': 'error', 
'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None', 
'type': 'BadRequestError', 
'param': None, 
'code': 400}

@houseroad
Copy link
Collaborator

Thanks for adding the support, @Xu-Wenqing. Btw, could you paste the test results in the PR description? Also do you want to include the updated version in this PR? It's fine to have another one to include the updated template.

@alllexx88
Copy link

I'm experiencing the same with DeepSeek-R1-0528-Qwen3-8B as @NaiveYan is. IDK if it helps, but here's an ollama chat template that has tool calling working with this model: https://ollama.com/okamototk/deepseek-r1:8b/blobs/e94a8ecb9327
Thanks

@houseroad
Copy link
Collaborator

@alllexx88 my understanding of DeepSeek-R1-0528-Qwen3-8B is actually not DeepSeek R1 model, but Qwen3-8B model instead. Could you try Qwen3's function call chat template?

@alllexx88
Copy link

@houseroad Thanks for the reply! I assume you're right, but it doesn't work for me with Qwen3's function call template:

python -m vllm.entrypoints.openai.api_server \
        --port=5003 \
        --model deepseek-ai/DeepSeek-R1-0528-Qwen3-8B \
        --enable-auto-tool-choice \
        --tool-call-parser hermes

I'm getting a response without any tools.

My test code and output
from langchain.chat_models import init_chat_model
from pydantic import BaseModel, Field

llm = init_chat_model(
    **{
        "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
        "base_url": "http://localhost:5003/v1",
        "api_key": "NONE",
        "model_provider": "openai",
        "temperature": 0,
    }
)


class GetWeather(BaseModel):
    """Get weather for a location"""

    location: str = Field(description="The location to get the weather for")


llm_with_tools = llm.bind_tools([GetWeather])

res = llm_with_tools.invoke(
    [{"role": "user", "content": "What's the weather like in Paris?"}]
)

print(res)

Output:

content='<think>\nOkay, user is asking about the weather in Paris. Hmm, this is a pretty common question, but also super important if they\'re planning a trip or just curious. \n\nFirst thought: weather in Paris changes a lot depending on the time of year. Should I just give a generic answer or break it down by seasons? Breaking it down seems better because it gives more useful info. \n\nWait, maybe they\'re actually planning a visit? If so, they\'d need more than just seasonal averages. Like, "Paris in spring is beautiful but rainy" is useful, but if they\'re deciding when to go, they might need specifics. But the question is vague... \n\nI should cover the main seasons but also mention that weather can be unpredictable. And maybe add a tip about checking forecasts closer to the date. \n\nOh! Important to note that Paris has a maritime climate - mild, but can be changeable. That explains why it\'s not just hot summers like some places. \n\nShould I include temperature ranges? Yes, but keep it simple since the user didn\'t ask for super detailed meteorology. Just general comfort levels ("cool to mild" etc). \n\nAlso, should mention rain - Paris is known for that. And maybe a tiny bit about sunshine hours since that affects how comfortable the weather feels. \n\nFinal thought: end with a practical tip about checking current conditions. People often forget that even if they know seasonal averages, the actual day-to-day weather can vary wildly. \n\nThis seems like a straightforward query from someone who might be planning travel or just curious. Not urgent, but still deserves a thorough but clear answer. No need to overcomplicate it unless they follow up with more specific questions.\n</think>\nThe weather in Paris can vary quite a bit depending on the time of year. Here\'s a general overview:\n\n*   **Spring (March - May):** Generally mild and pleasant. Temperatures usually range from the mid-teens to low twenties Celsius (around 60-70°F). It can be cool, especially in the mornings and evenings, and rain is common. Spring is often considered one of the best times to visit due to beautiful blooming gardens and comfortable temperatures.\n\n*   **Summer (June - August):** Warm and sunny, but not always extremely hot. Average highs are often around 25°C (77°F), but heatwaves can push temperatures higher. It\'s usually the sunniest time of year, but humidity can be high. Rain is less frequent than in spring, but thunderstorms can still occur.\n\n*   **Autumn (September - November):** Similar to spring in terms of temperature range, often starting warm but cooling down as the season progresses. September and early October are usually still quite pleasant, while November can be quite cool, even cold at night, with frequent rain. It\'s a great time to see Paris in a different light, with fewer tourists.\n\n*   **Winter (December - February):** Cold, with frequent rain or snow (though snow isn\'t guaranteed every year). Daytime temperatures often hover around 2-5°C (36-41°F), and nights can be much colder. It\'s the wettest season, and daylight hours are shorter.\n\n**In summary:** Expect mild, changeable weather in Paris. It\'s rarely extremely hot or cold, but be prepared for rain and cool temperatures, especially outside of the peak summer months. Always check the current forecast for your specific travel dates!' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 708, 'prompt_tokens': 11, 'total_tokens': 719, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-ai/DeepSeek-R1-0528-Qwen3-8B', 'system_fingerprint': None, 'id': 'chatcmpl-7d9dbdbbeca441ddac943dfd0a652ab9', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None} id='run--8f47f53e-28bb-4ae8-8fc4-651875db60ad-0' usage_metadata={'input_tokens': 11, 'output_tokens': 708, 'total_tokens': 719, 'input_token_details': {}, 'output_token_details': {}}

The same code works fine with the ollama model:

content='' additional_kwargs={'tool_calls': [{'id': 'call_mt24encn', 'function': {'arguments': '{"location":"Paris"}', 'name': 'GetWeather'}, 'type': 'function', 'index': 0}, {'id': 'call_4j04q5hw', 'function': {'arguments': '{"location":"Paris"}', 'name': 'GetWeather'}, 'type': 'function', 'index': 0}], 'refusal': None} response_metadata={'token_usage': {'completion_tokens': 87, 'prompt_tokens': 158, 'total_tokens': 245, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'okamototk/deepseek-r1:8b', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-375', 'service_tier': None, 'finish_reason': 'tool_calls', 'logprobs': None} id='run--7d782d16-b8b2-4f3c-9c40-fabcea5bc045-0' tool_calls=[{'name': 'GetWeather', 'args': {'location': 'Paris'}, 'id': 'call_mt24encn', 'type': 'tool_call'}, {'name': 'GetWeather', 'args': {'location': 'Paris'}, 'id': 'call_4j04q5hw', 'type': 'tool_call'}] usage_metadata={'input_tokens': 158, 'output_tokens': 87, 'total_tokens': 245, 'input_token_details': {}, 'output_token_details': {}}

P.S. Since it's probably a different model, should I open an issue for it instead of writing in this PR?

Thanks!

@houseroad
Copy link
Collaborator

@alllexx88, yeah, I think creating another issue is a better choice, since we consider it as a separate issue.

@wukaixingxp
Copy link
Contributor

Not sure if it is a intentional, but this chat-template has some spaces at the beginning of some lines.. should we clean it up? the current version looks like this and here is my cleaned version. Let me know if it is helpful.

@wukaixingxp
Copy link
Contributor

@Xu-Wenqing I just tested with llama-stack-eval on your PR and it seems the non_streaming tool call failed as well, can you double check if it is a model problem or chat_template problem? CC: @houseroad

_____________ test_chat_non_streaming_tool_calling[deepseek-ai/DeepSeek-R1-0528-basic] ______________
request = <FixtureRequest for <Function test_chat_non_streaming_tool_calling[deepseek-ai/DeepSeek-R1-0528-basic]>>
openai_client = <openai.OpenAI object at 0x7f86edc81610>, model = 'deepseek-ai/DeepSeek-R1-0528'
provider = 'vllm'
verification_config = {'cerebras': ProviderConfig(provider='cerebras', base_url='https://api.cerebras.ai/v1', api_key_var='CEREBRAS_API_KEY'...4-maverick-17b-128e-instruct', canonical_id='Llama-4-Maverick-Instruct')], test_exclusions={}, self_hosted=False), ...}
case = {'case_id': 'basic', 'expected': {'tool_arguments': {'location': 'San Francisco'}, 'tool_name': 'get_weather'}, 'input...rs': {'additionalProperties': False, 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}}
    @pytest.mark.parametrize(
        "case",
        chat_completion_test_cases["test_tool_calling"]["test_params"]["case"],
        ids=case_id_generator,
    )
    def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):
        if "xfail" in case and case["xfail"]:
            pytest.xfail(case["xfail"]["reason"])
    
        test_name_base = get_base_test_name(request)
        if should_skip_test(verification_config, provider, model, test_name_base):
            pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
    
        response = openai_client.chat.completions.create(
            model=model,
            messages=case["input"]["messages"],
            tools=case["input"]["tools"],
            stream=False,
        )
    
        assert response.choices[0].message.role == "assistant"
>       assert len(response.choices[0].message.tool_calls) == 1
E       assert 0 == 1
E        +  where 0 = len([])
E        +    where [] = ChatCompletionMessage(content='<think>\nWe are given a user query: "What\'s the weather like in San Francisco?"\n We have one function available: get_weather, which requires a location (city and country).\n The user mentioned "San Francisco", but note that the function expects a city and country.\n Since the user did not specify the country, we can assume it\'s San Francisco, USA because it\'s a well-known city.\n However, note that the function description says "City and country e.g. Bogotá, Colombia". \n We should provide the location in the format: "San Francisco, USA".\n\n But note: the example format uses a comma and then the country. We\'ll use the same.\n\n However, let\'s consider: the function might be designed to handle just the city if the country is obvious? \n But the specification says it requires a location in the format of "City and country". So we must provide both.\n\n Since the user didn\'t specify, we\'ll use "San Francisco, USA" as the most common one.\n\n Therefore, we will call the function get_weather with the argument: {"location": "San Francisco, USA"}\n\n We must format the function call as:\n\nfunction\x1eget_weather\n```json\n{"location":"San Francisc...ser query? \n\n Yes, because we are calling the function.\n\n We do not provide any other text.\n\n So the assistant\'s response is exactly:\n\n   \\x1cfunction\\x1fget_weather\n   ```json\n   {"location": "San Francisco, USA"}\n   ```\\x1e\\x1d\n\n But note: the GS (U+001D) is after the RS (U+001E). And we have no newline between the RS and GS? That\'s acceptable.\n\n However, the instruction says "At the end of function call returns, you should print \x1d", meaning that the GS comes immediately after the function call block (which ends with RS).\n\n So we do not put a newline between RS and GS.\n\n Therefore, we output the string as:\n\n   [FS] + "function" + [US] + "get_weather" + "\\n```json\\n" + "{\\"location\\": \\"San Francisco, USA\\"}" + "\\n```" + [RS] + [GS]\n\n But note: the JSON part: we are writing a string with a comma and space. That is acceptable.\n\n We are done.\n</think>\nI need to get the current temperature for San Francisco. I\'ll use the provided function.\n\nfunction\x1eget_weather\n```json\n{"location": "San Francisco, USA"}\n```\x1e', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None).tool_calls
E        +      where ChatCompletionMessage(content='<think>\nWe are given a user query: "What\'s the weather like in San Francisco?"\n We have one function available: get_weather, which requires a location (city and country).\n The user mentioned "San Francisco", but note that the function expects a city and country.\n Since the user did not specify the country, we can assume it\'s San Francisco, USA because it\'s a well-known city.\n However, note that the function description says "City and country e.g. Bogotá, Colombia". \n We should provide the location in the format: "San Francisco, USA".\n\n But note: the example format uses a comma and then the country. We\'ll use the same.\n\n However, let\'s consider: the function might be designed to handle just the city if the country is obvious? \n But the specification says it requires a location in the format of "City and country". So we must provide both.\n\n Since the user didn\'t specify, we\'ll use "San Francisco, USA" as the most common one.\n\n Therefore, we will call the function get_weather with the argument: {"location": "San Francisco, USA"}\n\n We must format the function call as:\n\nfunction\x1eget_weather\n```json\n{"location":"San Francisc...ser query? \n\n Yes, because we are calling the function.\n\n We do not provide any other text.\n\n So the assistant\'s response is exactly:\n\n   \\x1cfunction\\x1fget_weather\n   ```json\n   {"location": "San Francisco, USA"}\n   ```\\x1e\\x1d\n\n But note: the GS (U+001D) is after the RS (U+001E). And we have no newline between the RS and GS? That\'s acceptable.\n\n However, the instruction says "At the end of function call returns, you should print \x1d", meaning that the GS comes immediately after the function call block (which ends with RS).\n\n So we do not put a newline between RS and GS.\n\n Therefore, we output the string as:\n\n   [FS] + "function" + [US] + "get_weather" + "\\n```json\\n" + "{\\"location\\": \\"San Francisco, USA\\"}" + "\\n```" + [RS] + [GS]\n\n But note: the JSON part: we are writing a string with a comma and space. That is acceptable.\n\n We are done.\n</think>\nI need to get the current temperature for San Francisco. I\'ll use the provided function.\n\nfunction\x1eget_weather\n```json\n{"location": "San Francisco, USA"}\n```\x1e', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None) = Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='<think>\nWe are given a user query: "What\'s the weather like in San Francisco?"\n We have one function available: get_weather, which requires a location (city and country).\n The user mentioned "San Francisco", but note that the function expects a city and country.\n Since the user did not specify the country, we can assume it\'s San Francisco, USA because it\'s a well-known city.\n However, note that the function description says "City and country e.g. Bogotá, Colombia". \n We should provide the location in the format: "San Francisco, USA".\n\n But note: the example format uses a comma and then the country. We\'ll use the same.\n\n However, let\'s consider: the function might be designed to handle just the city if the country is obvious? \n But the specification says it requires a location in the format of "City and country". So we must provide both.\n\n Since the user didn\'t specify, we\'ll use "San Francisco, USA" as the most common one.\n\n Therefore, we will call the function get_weather with the argument: {"location": "San Francisco, USA"}\n\n We must format the function call as:\n..., because we are calling the function.\n\n We do not provide any other text.\n\n So the assistant\'s response is exactly:\n\n   \\x1cfunction\\x1fget_weather\n   ```json\n   {"location": "San Francisco, USA"}\n   ```\\x1e\\x1d\n\n But note: the GS (U+001D) is after the RS (U+001E). And we have no newline between the RS and GS? That\'s acceptable.\n\n However, the instruction says "At the end of function call returns, you should print \x1d", meaning that the GS comes immediately after the function call block (which ends with RS).\n\n So we do not put a newline between RS and GS.\n\n Therefore, we output the string as:\n\n   [FS] + "function" + [US] + "get_weather" + "\\n```json\\n" + "{\\"location\\": \\"San Francisco, USA\\"}" + "\\n```" + [RS] + [GS]\n\n But note: the JSON part: we are writing a string with a comma and space. That is acceptable.\n\n We are done.\n</think>\nI need to get the current temperature for San Francisco. I\'ll use the provided function.\n\nfunction\x1eget_weather\n```json\n{"location": "San Francisco, USA"}\n```\x1e', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None).message
llama_stack_evals/functional_tests/openai_api/test_chat_completion.py:250: AssertionError
___ test_chat_non_streaming_multi_turn_tool_calling[deepseek-ai/DeepSeek-R1-0528-text_then_tool] ____
request = <FixtureRequest for <Function test_chat_non_streaming_multi_turn_tool_calling[deepseek-ai/DeepSeek-R1-0528-text_then_tool]>>
openai_client = <openai.OpenAI object at 0x7f86edb0d190>, model = 'deepseek-ai/DeepSeek-R1-0528'
provider = 'vllm'
verification_config = {'cerebras': ProviderConfig(provider='cerebras', base_url='https://api.cerebras.ai/v1', api_key_var='CEREBRAS_API_KEY'...4-maverick-17b-128e-instruct', canonical_id='Llama-4-Maverick-Instruct')], test_exclusions={}, self_hosted=False), ...}
case = {'case_id': 'text_then_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'tool_argum...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': "{'response': '70 degrees and foggy'}"}]}
    @pytest.mark.parametrize(
        "case",
        chat_completion_test_cases.get("test_chat_multi_turn_tool_calling", {}).get("test_params", {}).get("case", []),
        ids=case_id_generator,
    )
    def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):
        """
        Test cases for multi-turn tool calling.
        Tool calls are asserted.
        Tool responses are provided in the test case.
        Final response is asserted.
        """
    
        test_name_base = get_base_test_name(request)
        if should_skip_test(verification_config, provider, model, test_name_base):
            pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
    
        # Create a copy of the messages list to avoid modifying the original
        messages = []
        tools = case["input"]["tools"]
        # Use deepcopy to prevent modification across runs/parametrization
        expected_results = copy.deepcopy(case["expected"])
        tool_responses = copy.deepcopy(case.get("tool_responses", []))
        input_messages_turns = copy.deepcopy(case["input"]["messages"])
    
        # keep going until either
        # 1. we have messages to test in multi-turn
        # 2. no messages but last message is tool response
        while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1]["role"] == "tool"):
            # do not take new messages if last message is tool response
            if len(messages) == 0 or messages[-1]["role"] != "tool":
                new_messages = input_messages_turns.pop(0)
                # Ensure new_messages is a list of message objects
                if isinstance(new_messages, list):
                    messages.extend(new_messages)
                else:
                    # If it's a single message object, add it directly
                    messages.append(new_messages)
    
            # --- API Call ---
            response = openai_client.chat.completions.create(
                model=model,
                messages=messages,
                tools=tools,
                stream=False,
            )
    
            # --- Process Response ---
            assistant_message = response.choices[0].message
            messages.append(assistant_message.model_dump(exclude_unset=True))
    
            assert assistant_message.role == "assistant"
    
            # Get the expected result data
            expected = expected_results.pop(0)
            num_tool_calls = expected["num_tool_calls"]
    
            # --- Assertions based on expected result ---
>           assert len(assistant_message.tool_calls or []) == num_tool_calls, (
                f"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}"
            )
E           AssertionError: Expected 1 tool calls, but got 0
E           assert 0 == 1
E            +  where 0 = len(([] or []))
E            +    where [] = ChatCompletionMessage(content='<think>\nWe are given a user query: "What\'s the weather like in San Francisco?"\n We have a function called "get_weather" that requires a "location" parameter.\n The location should be the city and state (both required). Since the user only said "San Francisco", we should assume the state is California (CA) because that\'s the most famous one.\n However, note that the function description says "The city and state (both required)". So we must provide both.\n We can use "San Francisco, CA" as the location.\n\n Steps:\n 1. We are going to call the function "get_weather" with the argument: {"location": "San Francisco, CA"}\n 2. We must format the function call exactly as specified.\n\n Format:\n   - Start with: `function_call`\n   - Then on the next line: ```json\n     { ... }\n     ```\n   - End with: `end_function_call`\n\n But note: the example format provided was:\n      •#function_call•#\n      ```json\n      { ... }\n      ```\n      •#end_function_call•#\n\n However, the actual tokens we are to use are:\n      •#function_call•#   -> at the beginning of the function call\n      •#end_function_call•# -> at the end\n\n But note: the example in the ...nside the JSON. However, the function call format we are to use is defined as:\n\n      •#function_call•#\n      ```json\n      {\n        "name": "function_name",\n        "arguments": {\n          "arg1": "value1",\n          ...\n        }\n      }\n      ```\n      •#end_function_call•#\n\n Actually, the instructions did not specify the exact JSON structure? But the common practice in function calling is to have a JSON object with two keys: "name" and "arguments". And "arguments" is another object with the parameters.\n\n Therefore, we output:\n\n •#function_call•#\n```json\n{\n  "name": "get_weather",\n  "arguments": {\n    "location": "San Francisco, CA"\n  }\n}\n```\n •#end_function_call•#\n</think>\nTo retrieve the weather information for San Francisco, I\'ll call the `get_weather` function with the required location parameter. Since the function requires both city and state, I\'ll use "San Francisco, CA" as the location.\n\n •#function_call•#\n```json\n{"name": "get_weather", "arguments": {"location": "San Francisco, CA"}}\n```\n •#end_function_call•#', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None).tool_calls
llama_stack_evals/functional_tests/openai_api/test_chat_completion.py:462: AssertionError

@Xu-Wenqing
Copy link
Contributor Author

@houseroad @wukaixingxp @markluofd @NaiveYan @alllexx88 Sorry for the late reply. The past few days were Chinese Dragon Boat Festival, I didn’t check messages. I’ll try out the chat template on some test datasets again. Meanwhile, seems DeepSeek update DeepSeek-R1-0528 chat template: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/commit/4236a6af538feda4548eca9ab308586007567f52#d2h-846292 I will also update the template here.

@Zongru-Wang
Copy link

Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None', 'type': 'BadRequestError', 'param': None, 'code': 400}, if i use langchain will tool call agent, it will get the following error.

vllm serve /deepseek-ai/DeepSeek-R1-0528-Qwen3-8B --tensor-parallel-size 8 --host 0.0.0.0 --port 10001 --api-key none --rope-scaling '{"factor": 2.0, "original_max_position_embeddings": 32768, "rope_type": "yarn"}' --gpu-memory-utilization 0.9 --enable-reasoning --reasoning-parser deepseek_r1 --guided_decoding_backend guidance --enable-auto-tool-choice --tool-call-parser deepseek_v3 --chat-template /home/ubuntu/wzr/LLM-MODELS/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/tool_chat_template_deepseekr1.jinja --served-model-name DeepSeek-R1

@Zongru-Wang
Copy link

Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None', 'type': 'BadRequestError', 'param': None, 'code': 400}, if i use langchain will tool call agent, it will get the following error.

vllm serve /deepseek-ai/DeepSeek-R1-0528-Qwen3-8B --tensor-parallel-size 8 --host 0.0.0.0 --port 10001 --api-key none --rope-scaling '{"factor": 2.0, "original_max_position_embeddings": 32768, "rope_type": "yarn"}' --gpu-memory-utilization 0.9 --enable-reasoning --reasoning-parser deepseek_r1 --guided_decoding_backend guidance --enable-auto-tool-choice --tool-call-parser deepseek_v3 --chat-template /home/ubuntu/wzr/LLM-MODELS/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/tool_chat_template_deepseekr1.jinja --served-model-name DeepSeek-R1

I tried --tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja, and got Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None', 'type': 'BadRequestError', 'param': None, 'code': 400}, if I use --tool-call-parser hermes, vllm backend shows: The following fields were present in the request but ignored: {'function_call'}.

I am using langchain agent to make tool calls, QWQ-32B, Qwen3 serires works fine for me.

Comment on lines 242 to 243
* `deepseek-ai/DeepSeek-V3-0324`
* `deepseek-ai/DeepSeek-R1-0528`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this look in the docs?

Suggested change
* `deepseek-ai/DeepSeek-V3-0324`
* `deepseek-ai/DeepSeek-R1-0528`
* `deepseek-ai/DeepSeek-V3-0324` (`--tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekv3.jinja`)
* `deepseek-ai/DeepSeek-R1-0528` (`--tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja`)

Copy link
Contributor Author

@Xu-Wenqing Xu-Wenqing Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmellor Updated the markdown file.

Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
@Xu-Wenqing
Copy link
Contributor Author

@houseroad Updated the chat template, and added test results in description.

@Xu-Wenqing Xu-Wenqing requested a review from hmellor June 4, 2025 11:04
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@houseroad houseroad enabled auto-merge (squash) June 4, 2025 11:31
@houseroad houseroad merged commit 02658c2 into vllm-project:main Jun 4, 2025
49 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in DeepSeek V3/R1 Jun 4, 2025
@menardorama
Copy link

Hi,
I have tested this PR with https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 and tool calling is never working

 vllm serve deepseek-ai/DeepSeek-R1-0528  --port 8000 --trust-remote-code --tensor-parallel-size 8 --enable-reasoning --reasoning-parser deepseek_r1 --tool-call-parser deepseek_v3 --enable-auto-tool-choice --chat-template /vllm-templates/tool_chat_template_deepseekr1_v2.jinja"

Is anybody got success using this model and this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed tool-calling
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.