Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] tool calling support for ibm-granite/granite-20b-functioncalling #8339

Merged
merged 35 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
58e468d
initial commit
wseaton Sep 10, 2024
d4cc66b
remove original part of template
wseaton Sep 10, 2024
742704f
clean up debug logging
wseaton Sep 10, 2024
410ff88
update docs; raise not implemented
wseaton Sep 10, 2024
a5e9a1f
fix lints
wseaton Sep 10, 2024
3d28b6d
sort imports
wseaton Sep 10, 2024
74c8cc7
yapf fixes
wseaton Sep 10, 2024
23a4ca3
another format change
wseaton Sep 10, 2024
1659236
update example prompt to be conversational instead of single turn
wseaton Sep 10, 2024
b1e09a8
update docs for template; link paper
wseaton Sep 10, 2024
e82b2a6
Merge remote-tracking branch 'upstream/main' into granite-fc
wseaton Sep 27, 2024
6b0eebb
add granite to test config
wseaton Sep 27, 2024
346d554
fixup json
wseaton Sep 27, 2024
24e49b8
Add stream support for Granite 20b Tool Use
maxdebayser Sep 27, 2024
86dead8
fix docs
maxdebayser Sep 27, 2024
113fbb6
more robust whispace handling
maxdebayser Sep 28, 2024
acecb6d
remove reference to defunct granite parser
wseaton Oct 2, 2024
86e8466
remove old template
wseaton Oct 2, 2024
43c8078
Update tests/tool_use/utils.py to remove dupe
wseaton Oct 7, 2024
6bf4a41
Merge remote-tracking branch 'upstream/main' into granite-fc
wseaton Oct 7, 2024
2e969c7
fix double import
wseaton Oct 7, 2024
e18219c
add completion request arg to abstract method
wseaton Oct 7, 2024
9a0321b
formatting fixes
wseaton Oct 7, 2024
078ab85
import sorts
wseaton Oct 7, 2024
0a031bf
appease yapf
wseaton Oct 7, 2024
c6a6b56
Apply suggestions from code review
wseaton Oct 16, 2024
defed52
remove redudant indents; add type hints to utils
wseaton Oct 17, 2024
5b78cea
Merge branch 'granite-fc' of github.com:wseaton/vllm into granite-fc
wseaton Oct 17, 2024
2d3b8fe
formatting churn
wseaton Oct 17, 2024
fe13b72
Merge branch 'main' into granite-fc
wseaton Oct 21, 2024
84e93bf
change to old style type aliasing
wseaton Oct 21, 2024
ae55760
Merge branch 'granite-fc' of github.com:wseaton/vllm into granite-fc
wseaton Oct 21, 2024
1277f0b
Doc reformat, add back missing line
wseaton Oct 25, 2024
738d003
Temporarily disable the granite20b-fc test task
wseaton Oct 25, 2024
a6e1bf9
Merge branch 'vllm-project:main' into granite-fc
wseaton Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
more robust whispace handling
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
  • Loading branch information
maxdebayser authored and wseaton committed Oct 2, 2024
commit 113fbb6f3f2fd208acbbc177d8061bc0207a2c4d
12 changes: 1 addition & 11 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,14 +227,4 @@ Supported models:
* `ibm-granite/granite-20b-functioncalling`

Flags: `--tool-call-parser granite-20b-fc`
`examples/tool_chat_template_granite20b_fc.jinja`: this is a modified chat template from the original on Huggingface, which is not vLLM compatible. It blends function description elements from the Hermes template and follows the same system prompt as "Response Generation" mode from [the paper](https://arxiv.org/abs/2407.00121). Parallel function calls are supported.

Supported models:
* `ibm-granite/granite-20b-functioncalling`

Flags: `--tool-call-parser granite`

Known issues:
1. Tool call parsing is not yet supported in streaming mode.

* `examples/tool_chat_template_granite_response.jinja` - this is a modified chat template from the original on Huggingface, which is not vLLM compatible. It blends function description elements from the Hermes template and follows the same system prompt as "Response Generation" mode from [the paper](https://arxiv.org/abs/2407.00121). Parallel function calls are supported.
`examples/tool_chat_template_granite_20b_fc.jinja`: this is a modified chat template from the original on Huggingface, which is not vLLM compatible. It blends function description elements from the Hermes template and follows the same system prompt as "Response Generation" mode from [the paper](https://arxiv.org/abs/2407.00121). Parallel function calls are supported.
wseaton marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 2 additions & 2 deletions tests/tool_use/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ def ensure_system_prompt(messages: List[Dict[str, Any]],
"model":
"ibm-granite/granite-20b-functioncalling",
"arguments": [
"--tool-call-parser", "granite20b-fc", "--chat-template",
str(VLLM_PATH / "examples/tool_chat_template_granite.jinja")
"--tool-call-parser", "granite-20b-fc", "--chat-template",
str(VLLM_PATH / "examples/tool_chat_template_granite_20b_fc.jinja")
],
}
}
Expand Down
2 changes: 1 addition & 1 deletion vllm/entrypoints/openai/tool_parsers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .abstract_tool_parser import ToolParser
from .granite_20bfc_tool_parser import Granite20bFCToolParser
from .granite_20b_fc_tool_parser import Granite20bFCToolParser
from .hermes_tool_parser import Hermes2ProToolParser
from .llama_tool_parser import Llama3JsonToolParser
from .mistral_tool_parser import MistralToolParser
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
FunctionCall, ToolCall)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
ToolParser)
wseaton marked this conversation as resolved.
Show resolved Hide resolved
from vllm.entrypoints.openai.tool_parsers.utils import (find_common_prefix,
from vllm.entrypoints.openai.tool_parsers.utils import (consume_space,
find_common_prefix,
is_complete_json,
partial_json_loads)
from vllm.logger import init_logger
Expand Down Expand Up @@ -121,14 +122,19 @@ def extract_tool_calls_streaming(
is_complete = []
try:
start_idx = len(self.bot_token)
start_idx = consume_space(start_idx, current_text)

while start_idx < len(current_text):
(obj,
end_idx) = partial_json_loads(current_text[start_idx:],
flags)
is_complete.append(
is_complete_json(current_text[start_idx:start_idx +
end_idx]))
start_idx += end_idx + len(self.bot_token) + 1
start_idx += end_idx
start_idx = consume_space(start_idx, current_text)
start_idx += len(self.bot_token)
start_idx = consume_space(start_idx, current_text)
tool_call_arr.append(obj)
except partial_json_parser.core.exceptions.MalformedJSON:
logger.debug('not enough tokens to parse into JSON yet')
Expand Down
6 changes: 6 additions & 0 deletions vllm/entrypoints/openai/tool_parsers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,9 @@ def is_complete_json(input_str):
return True
except JSONDecodeError:
return False


def consume_space(i, s):
while i < len(s) and s[i].isspace():
i += 1
return i
Loading