OpenAI Tools / function calling v2 #3237

FlorianJoncour · 2024-03-06T15:25:52Z

This PR follows #2488

The implementation has been updated to use the new guided generation.

If during a query, the user sets tool_choice to auto, the server will use the template system used in #2488.
However, if a specific function is defined, guided generation will be used to only generate the parameters.

Everything is detailed in the openai_tools_calls.py example.

mn9891 · 2024-03-07T19:20:32Z

Testing the tools/function calls feature using the example provided (in openai_tools_calls.py) it works fine as is, but when you go further in the chat and ask another question that needs a function calling, it never goes to tools but the assistant would reply 'call_get_current_weather_0 was called with arguments .... ' but not using tool_calls.
See the gist here
Tested it in other setups/functions, and it is always the case, the second function call would fail (tool_calls=None) and the assistant would reply 'call_function_name_0 called with ... '

simon-mo · 2024-03-08T06:00:21Z

Can we go further on reducing the templating to purely JSON schema? I believe it is possible by framing it as

{
"tool_choice": one of the tool name
"tool_params": constrained schema
}

Uhao-P · 2024-03-13T08:20:06Z

thank you for your work.
I successfully installed vllm using your branch. When I start openai.api_server and call it, the error AttributeError: 'ActorHandle' object has no attribute 'tokenizer' occurs, #3301 solves the above error.
After openai.api_server is started, you can use the /v1/chat/completions interface to call services and function calling functions normally. However, when I call the /v1/completions interface, an error will appear. AttributeError: 'CompletionRequest' object has no attribute 'tool_choice'
Modify vllm/entrypoints/openai/protocol.py. Adding tools and tool_choice in class CompletionRequest to solve this problem.

FlorianJoncour · 2024-03-16T02:21:39Z

@simon-mo : I so removed everything about the jinja template.
I think that was not such a great idea on my side.
The template can now be passed as an argument in requests (tool_params).
But if this param is None, a generic template defined in protocol.py is used.
See the dict TOOLS_TEMPLATE in the sample script openai_tools_calls.py.

@mn9891 : I suspect a bad model for this use case. I started my development with NeuralHermes-7B, but there's a recent model based on Mistral 7B developed by NousResearch that has been trained to call functions which is really great:
https://huggingface.co/FoxEngineAi/Hermes-2-Pro-Mistral-7B_AWQ-GEMM_4_128_True

@Uhao-P : CompletionRequest is not supposed to call functions.
Maybe there are some missing type checking, it'll check this soon.

RonanKMcGovern · 2024-03-18T10:40:41Z

@FlorianJoncour could you share a sample of a fully formatted prompt containing tools? Say, for a Mistral model?

FlorianJoncour · 2024-03-18T23:51:00Z

Thank you, there's an edge case I hadn't considered. It's possible that the model generates a list of function calls in JSON rather than making the calls one after the other.

Now, I was able to make the example script work with Mistral-7B-Instruct-v0.2 despite the fact that it's not trained for this.

Edit: I forgot something. The default chat template in the Mistral-7B model enforces an alternating user/assistant text pattern. Since there can be consecutive function calls, the template will raise an error.
Here is a working chat template:
{{ bos_token }}{% for message in messages %}{% if message['role'] == 'user' %}{{ '\n[INST] ' + message['content'] + ' [/INST]' }}{% else %}{{ '\n' + message['content'] + eos_token}}{% endif %}{% endfor %}

Uhao-P · 2024-03-24T18:38:50Z

When I use a Python script to make a function call, it succeeds regardless of whether the stream is set to true or false. However, when I integrate with the frontend, and the stream is set to true, the frontend doesn't parse the received results well, and the arguments are always truncated. Comparing ChatGPT with VLLM, ChatGPT returns the arguments in a streaming fashion, whereas VLLM returns a complete set of arguments.
I have successfully simulated ChatGPT's streaming response on the web, which can be parsed successfully. Is there any plan to modify VLLM's streaming calls to match the format of ChatGPT? Can I contribute by submitting a pull request to help accomplish this?

jamestwhedbee · 2024-03-27T14:24:46Z

I am eagerly awaiting this too. Is there any area where contributions would be welcomed to help merge this?

kausky · 2024-04-09T05:49:29Z

hi, is this PR being worked on?
Pretty excited for this to merge, ready to extend any help if needed.

weiminw · 2024-04-13T03:19:32Z

hope this feature release as soon as possible.

RonanKMcGovern · 2024-04-13T10:17:49Z

@FlorianJoncour could you share a sample of a fully formatted prompt containing tools? Say, for a Mistral model?

Is it possible to share a fully formatted prompt sample?

That helps a lot anyone fine tuning models...

AaronFriel

@FlorianJoncour left some comments, but at a more meta level than any individual line, could you describe how this PR enables responding with multiple function calls (like OpenAI's API)?

OpenAI's API has a weakness where it behaves poorly in streaming mode when it uses the "parallel.multi-tool-use" function as a wrapper, which breaks streaming behavior. I want to make sure as a consumer of the tool API there, that:

This implementation doesn't duplicate that behavior or breaking streaming.
I understood what the response chunks look like when the model makes multiple calls - I didn't see that exactly, but I did just skim the PR.

Could you point me to what multi calling looks like?

AaronFriel · 2024-04-13T23:54:15Z

vllm/entrypoints/openai/api_server.py

@@ -125,6 +133,12 @@ def parse_args():
                        type=str,
                        default=None,
                        help="The file path to the SSL cert file")
+    parser.add_argument(
+        "--privileged",


I think this engine arg is not sufficiently descriptive. To me, "privileged" evokes a notion of requiring root or extra capabilities (a la docker and containers, or user account control in Windows, or admin elevation in macOS.)

This is probably more aptly described as --enable-debug-reload-api?

AaronFriel · 2024-04-13T23:55:51Z

vllm/entrypoints/openai/api_server.py

@@ -163,6 +206,16 @@ async def health() -> Response:
    return Response(status_code=200)


+if "--privileged" in sys.argv:
+
+    @app.get("/privileged")


Likewise here, I don't think /privileged describes what this request route does.

And should this be a POST, not a GET, as it is an effectful operation?

AaronFriel · 2024-04-13T23:56:16Z

vllm/entrypoints/openai/api_server.py

+        logger.warning(
+            "\n"
+            "##########################################################################\n"
+            "privileged mode enabled. This should only be used for development purpose.\n"


Here as well - "privileged" just doesn't describe to me what is happening.

darkacorn · 2024-06-18T13:12:14Z

is the holdup the naming convention or what am i missing ?

K-Mistele · 2024-06-18T15:42:23Z

is the holdup the naming convention or what am i missing ?

I think it's more that there are other PRs open on this (e.g. #4656 ) but all of them have shortcomings and are pretty opinionated in one way or another. Tool choice for non-auto tool calling is now supported via guided decoding, but "auto" tool choice is a lot harder to get right because different models use different tool choice prompt templates, use different tokens for indicating tool calls, etc. all of which have to be parsed based on the model-specific format and (ideally) streamed back to the client in an OpenAI API-compatible way, which none of the current PRs fully support

darkacorn · 2024-06-18T18:20:45Z

https://github.com/mistralai/mistral-common/tree/main/src/mistral_common/protocol/instruct

should be a decent starting ground / if noone can agree

K-Mistele · 2024-06-18T19:57:14Z

https://github.com/mistralai/mistral-common/tree/main/src/mistral_common/protocol/instruct

should be a decent starting ground / if noone can agree

I'm trying to work on a PR for an implementation that's less opinionated and would work with Mistral 7B Instruct v0.3, as well as the Hermes 2 Pro models by Nous Research & other tool-calling-capable open models in #5649

FlorianJoncour added 4 commits March 6, 2024 01:17

Update tu vllm 0.3.3

2dabe7a

New implementation including guided tools calls

de43c00

Bugfixes in the example

dca852a

Format

fc6f98b

FlorianJoncour mentioned this pull request Mar 6, 2024

OpenAI Tools / function calling #2488

Closed

simon-mo self-assigned this Mar 6, 2024

FlorianJoncour added 2 commits March 6, 2024 21:30

Minor fixes

e9e79c9

Minor fixes

2aa5048

tool template moved in requests. jinja template removed

f8a5423

FlorianJoncour added 2 commits March 16, 2024 16:01

Bugfix

2f4eb47

Format

cc5a28a

Better calls list parsing

cf211fe

Better error handling & bugfixes

6010cca

simon-mo mentioned this pull request Apr 4, 2024

[Roadmap] vLLM Roadmap Q2 2024 #3861

Closed

65 tasks

esmeetu mentioned this pull request Apr 10, 2024

[Frontend] Support Tool and RAG #3971

Closed

2 tasks

AaronFriel reviewed Apr 14, 2024

View reviewed changes

br3no mentioned this pull request Apr 16, 2024

[Usage]: how to use tools/tool_choice while using open api endpoints #4080

Closed

hmellor mentioned this pull request Apr 20, 2024

Support tools and tool_choice parameter in OpenAI compatible service #1869

Closed

Xwdit mentioned this pull request May 7, 2024

[Frontend][OpenAI] Add support for OpenAI tools calling #4656

Closed

K-Mistele mentioned this pull request Jun 18, 2024

[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models #5649

Merged

20 tasks

TimPietrusky mentioned this pull request Jul 19, 2024

Support for tools / tool_choice="auto" in OpenAI-compatible API runpod-workers/worker-vllm#85

Closed

2 tasks

simon-mo closed this in #5649 Sep 4, 2024

Uh oh!

OpenAI Tools / function calling v2 #3237

OpenAI Tools / function calling v2 #3237

Uh oh!

Conversation

FlorianJoncour commented Mar 6, 2024

Uh oh!

mn9891 commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simon-mo commented Mar 8, 2024

Uh oh!

Uhao-P commented Mar 13, 2024

Uh oh!

FlorianJoncour commented Mar 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RonanKMcGovern commented Mar 18, 2024

Uh oh!

FlorianJoncour commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uhao-P commented Mar 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamestwhedbee commented Mar 27, 2024

Uh oh!

kausky commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiminw commented Apr 13, 2024

Uh oh!

RonanKMcGovern commented Apr 13, 2024

Uh oh!

AaronFriel left a comment

Choose a reason for hiding this comment

Uh oh!

AaronFriel Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

AaronFriel Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

AaronFriel Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

darkacorn commented Jun 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

K-Mistele commented Jun 18, 2024

Uh oh!

darkacorn commented Jun 18, 2024

Uh oh!

K-Mistele commented Jun 18, 2024

Uh oh!

Uh oh!

mn9891 commented Mar 7, 2024 •

edited

Loading

FlorianJoncour commented Mar 16, 2024 •

edited

Loading

FlorianJoncour commented Mar 18, 2024 •

edited

Loading

Uhao-P commented Mar 24, 2024 •

edited

Loading

kausky commented Apr 9, 2024 •

edited

Loading

darkacorn commented Jun 18, 2024 •

edited

Loading