Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] tool calling support for ibm-granite/granite-20b-functioncalling #8339

Merged
merged 35 commits into from
Oct 29, 2024

Conversation

wseaton
Copy link
Contributor

@wseaton wseaton commented Sep 10, 2024

Add tool calling parser support for ibm-granite/granite-20b-functioncalling.

Also adds an example chat template that is based off of the granite fc paper.




{% set sys_prompt = 'You are a helpful assistant with access to the following function calls. Your task is to produce a sequence of function calls necessary to generate response to the user utterance. Use the following function calls as required.' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the paper, the system prompt used for the end-to-end scenario where the model has to explain the tool output is

You are a helpful assistant with access to the following function calls. Your task is to understand the given conversation with function calls and responses and generate natural language response as the ASSISTANT to continue the conversation. You may use the following function calls to understand how to respond to the user query.

In my experiments with the model it worked well, so I perhaps it would be a nice default in this example file.

Copy link
Contributor Author

@wseaton wseaton Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout! In my evals, I have mostly been doing zero-shot single turn calls.

Let me update the example template with the more general conversational one since that is probably what most people want.

@njhill
Copy link
Member

njhill commented Sep 23, 2024

@wseaton one of the tests is failing and looks most likely related: https://buildkite.com/vllm/fastcheck/builds/3985#0191dd9c-724a-4fbe-99d3-a1e1c94a2106

@wseaton
Copy link
Contributor Author

wseaton commented Sep 27, 2024

@maxdebayser are you still interested in contributing the streaming JSON parser for granite support? I have rebased off of main and might have some bandwidth to work on it, just let me know :)

@maxdebayser
Copy link
Contributor

Hi @wseaton , yes, I've been working on this. There is only one unit test that isn't passing. But I know what the cause is.

@maxdebayser
Copy link
Contributor

@wseaton, now all tests are passing. I've pushed a squashed merge of everything to this draft PR so that you can take a look and perhaps merge with yours: #8915

maxdebayser and others added 5 commits October 2, 2024 12:14
This commits builds on previous work by Will Eaton
and adds support for streaming. It also adds the
model to the tool use unit tests.

In this commit the tool parser is renamed from
simply granite to granite-20b-fc to differentiate
from other granite models.

Another minor change is that in the chat template
the function description using the function signature
is now optional.

Co-authored-by: Will Eaton <me@wseaton.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@wseaton
Copy link
Contributor Author

wseaton commented Oct 2, 2024

@maxdebayser I've cherry picked the commits from your PR, please review!

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @wseaton . There is an extra entry on the unit test model dict, but other than that it looks good to me.

cc @njhill

tests/tool_use/utils.py Outdated Show resolved Hide resolved
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
@wseaton
Copy link
Contributor Author

wseaton commented Oct 7, 2024

@maxdebayser working on fixing merge conflicts w/ upstream now

@wseaton
Copy link
Contributor Author

wseaton commented Oct 25, 2024

@njhill Happy to rebuild my fork of vLLM and re-verify the PR manually on an A100, disabled the test for now

@maxdebayser
Copy link
Contributor

@wseaton I've verified a couple of hours ago. The tool_use tests run successfully on an A100. I have a quantized version of the model that passes the tests and I'm testing it if it can fit in the CI environment.

@njhill
Copy link
Member

njhill commented Oct 25, 2024

The test could also be run on a larger GPU, although that's probably a bit wasteful

@maxdebayser yes that's what I thought too, since none of the other model sub-tests require it

@njhill
Copy link
Member

njhill commented Oct 25, 2024

Thanks @wseaton! Could you try merging in the latest main branch again? That should hopefully help with the failing CI tests.

@K-Mistele
Copy link
Contributor

@wseaton just a minor point of clarification, this PR is for Granite 20B functioncalling -- is this the same function calling format as the granite 3.0 models? I saw that these were released recently:

I'm not as familiar with IBM/granite as with some other families of models. if they are compatible, then that's great and we should indicate that in the docs, as well as name the tool parser granite most likely. If not (and this is what I suspect is the case, based on a brief reading of their docs & github code for agentic stuff), then it might be worth adding a call-out in the docs that this is NOT compatible with granite 3.0

@maxdebayser
Copy link
Contributor

@K-Mistele , they are not compatible. We have another PR for those models but we're waiting for this one to be merged first because there are some code dependencies.

@maxdebayser
Copy link
Contributor

maxdebayser commented Oct 28, 2024

@njhill , @wseaton my tests with a quantized version of the model are now passing. With quantization the model size was reduced to ~20GB (mbayser/granite-20b-functioncalling-FP8-KV), but I still had to add a bunch of flags to reduce memory usage:
"--max_num_seqs", "1", "--max-model-len", "1024", "--enforce-eager", "--cpu-offload-gb", "20"

@mergify mergify bot added documentation Improvements or additions to documentation frontend labels Oct 28, 2024
@wseaton
Copy link
Contributor Author

wseaton commented Oct 29, 2024

Because of the nature of this branch and how long it's been running (with many merge commits from main), I don't feel comfortable doing a rebase to fix the DCO issue. Can it be bypassed to get this merged?

@K-Mistele
Copy link
Contributor

@K-Mistele , they are not compatible. We have another PR for those models but we're waiting for this one to be merged first because there are some code dependencies.

sounds good! Let me know if you'd like for me to take a look at it whenever it's ready :)

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wseaton @maxdebayser for all of the work on this and thanks @K-Mistele for also reviewing.

@njhill
Copy link
Member

njhill commented Oct 29, 2024

@wseaton I'll merge this to unblock the other granite PR, could consider re-enabling the test with @maxdebayser's quantized model as a follow-on update...

@njhill njhill merged commit 882a1ad into vllm-project:main Oct 29, 2024
56 checks passed
@wseaton wseaton deleted the granite-fc branch October 29, 2024 23:18
rasmith pushed a commit to rasmith/vllm that referenced this pull request Oct 30, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
hissu-hyvarinen pushed a commit to ROCm/vllm that referenced this pull request Nov 6, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
@russellb russellb mentioned this pull request Nov 7, 2024
JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
…ing (vllm-project#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants