-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] tool calling support for ibm-granite/granite-20b-functioncalling #8339
Conversation
|
||
|
||
|
||
{% set sys_prompt = 'You are a helpful assistant with access to the following function calls. Your task is to produce a sequence of function calls necessary to generate response to the user utterance. Use the following function calls as required.' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the paper, the system prompt used for the end-to-end scenario where the model has to explain the tool output is
You are a helpful assistant with access to the following function calls. Your task is to understand the given conversation with function calls and responses and generate natural language response as the ASSISTANT to continue the conversation. You may use the following function calls to understand how to respond to the user query.
In my experiments with the model it worked well, so I perhaps it would be a nice default in this example file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout! In my evals, I have mostly been doing zero-shot single turn calls.
Let me update the example template with the more general conversational one since that is probably what most people want.
@wseaton one of the tests is failing and looks most likely related: https://buildkite.com/vllm/fastcheck/builds/3985#0191dd9c-724a-4fbe-99d3-a1e1c94a2106 |
@maxdebayser are you still interested in contributing the streaming JSON parser for granite support? I have rebased off of main and might have some bandwidth to work on it, just let me know :) |
Hi @wseaton , yes, I've been working on this. There is only one unit test that isn't passing. But I know what the cause is. |
This commits builds on previous work by Will Eaton and adds support for streaming. It also adds the model to the tool use unit tests. In this commit the tool parser is renamed from simply granite to granite-20b-fc to differentiate from other granite models. Another minor change is that in the chat template the function description using the function signature is now optional. Co-authored-by: Will Eaton <me@wseaton.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@maxdebayser I've cherry picked the commits from your PR, please review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
@maxdebayser working on fixing merge conflicts w/ upstream now |
@njhill Happy to rebuild my fork of vLLM and re-verify the PR manually on an A100, disabled the test for now |
@wseaton I've verified a couple of hours ago. The tool_use tests run successfully on an A100. I have a quantized version of the model that passes the tests and I'm testing it if it can fit in the CI environment. |
@maxdebayser yes that's what I thought too, since none of the other model sub-tests require it |
Thanks @wseaton! Could you try merging in the latest main branch again? That should hopefully help with the failing CI tests. |
@wseaton just a minor point of clarification, this PR is for Granite 20B functioncalling -- is this the same function calling format as the granite 3.0 models? I saw that these were released recently:
I'm not as familiar with IBM/granite as with some other families of models. if they are compatible, then that's great and we should indicate that in the docs, as well as name the tool parser |
@K-Mistele , they are not compatible. We have another PR for those models but we're waiting for this one to be merged first because there are some code dependencies. |
@njhill , @wseaton my tests with a quantized version of the model are now passing. With quantization the model size was reduced to ~20GB ( |
Because of the nature of this branch and how long it's been running (with many merge commits from main), I don't feel comfortable doing a rebase to fix the DCO issue. Can it be bypassed to get this merged? |
sounds good! Let me know if you'd like for me to take a look at it whenever it's ready :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wseaton @maxdebayser for all of the work on this and thanks @K-Mistele for also reviewing.
@wseaton I'll merge this to unblock the other granite PR, could consider re-enabling the test with @maxdebayser's quantized model as a follow-on update... |
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
…ing (vllm-project#8339) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Add tool calling parser support for
ibm-granite/granite-20b-functioncalling
.Also adds an example chat template that is based off of the granite fc paper.