Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Router /v1/chat/completions not compatible with openai spec #1887

Closed
2 of 4 tasks
phangiabao98 opened this issue May 14, 2024 · 2 comments
Closed
2 of 4 tasks

Router /v1/chat/completions not compatible with openai spec #1887

phangiabao98 opened this issue May 14, 2024 · 2 comments
Labels

Comments

@phangiabao98
Copy link
Contributor

System Info

CUDA: 12.1
Python 3.10
Rust: 1.75.0

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Run launcher in docker and mount sock to host with below cli
    docker run --gpus all --shm-size 1g -v /tmp:/tmp -v /root/Project/text-generation-inference/ink-tgi/models:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0
  2. get model tokenizer_config.json and change chattemplate as below:
    "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'}}{% if message['tool_calls'] %} {{''}} {% else %} {{message['content'] + eos_token}} {% endif %}\n{% elif message['role'] == 'tool' %}\n{{ '<|tool|>\n' +message['name'] + '\n'+ message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
  3. Run router in host with
cd router
cargo run -- --tokenizer-config-path /root/Project/text-generation-inference/ink-tgi/router/tokenizer_config.json
  1. call function call curl as example of openai interface
curl --location 'http://localhost:3000/v1/chat/completions' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "What'\''s the weather like in San Francisco, Tokyo, and Paris?"
        },
        {
            "role": "assistant",
            "tool_calls": [
                {
                    "id": "0",
                    "function": {
                        "arguments": {
                            "location": "San Francisco, CA",
                            "unit": "celsius"
                        },
                        "name": "get_current_weather",
                        "description": "null"
                    },
                    "type": "function"
                }
            ]
        },
        {
            "tool_call_id": "0",
            "role": "tool",
            "name": "get_current_weather",
            "content": "{\"location\": \"San Francisco\", \"temperature\": \"72\", \"unit\": \"fahrenheit\"}"
        }
    ],
    "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "stop": [],
    "max_tokens": 500,
    "temperature": 0.5
}'
  1. There would be 2 error found:
  • struct Message in lib.rs doesn't have attribute tool_calls as Openai Spec
    image
  • struct ToolCall in lib.rs have id as u32 which must be String according to Openai Spec
    image

Expected behavior

Router must serve interface that support function calling implementation of lang chain or other LLM application frameworks.

You can test it with below python code

import openai
import json


client = openai.OpenAI(
    api_key = "", # can be anything
    base_url = "http://localhost:3000/v1" # NOTE: Replace with IP address and port of your llama-cpp-python server
)

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

def run_conversation():
    # Step 1: send the conversation and available functions to the model
    messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        messages.append(response_message)  # extend conversation with assistant's reply
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = tool_call.function.arguments
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )  # extend conversation with function response
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=messages,
        )  # get a new response from the model where it can see the function response
        return second_response
print(run_conversation())```
@phangiabao98
Copy link
Contributor Author

i created a PR at #1888

Narsil pushed a commit that referenced this issue May 16, 2024
# What does this PR do?

<!-- Remove if not applicable -->

Fixes # (issue)
#1887

## Before submitting
- [no ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [yes] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ yes] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [yes ] Did you make sure to update the documentation with your
changes? Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ yes] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

 -->

---------

Co-authored-by: Bao Phan <baopg@inter-k.com>
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 14, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2024
alfredgui2 pushed a commit to mlsys-io/kv.run that referenced this issue Jul 6, 2024
# What does this PR do?

<!-- Remove if not applicable -->

Fixes # (issue)
huggingface/text-generation-inference#1887

## Before submitting
- [no ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [yes] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ yes] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [yes ] Did you make sure to update the documentation with your
changes? Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ yes] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

 -->

---------

Co-authored-by: Bao Phan <baopg@inter-k.com>
yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this issue Jul 17, 2024
# What does this PR do?

<!-- Remove if not applicable -->

Fixes # (issue)
huggingface#1887

## Before submitting
- [no ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [yes] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ yes] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [yes ] Did you make sure to update the documentation with your
changes? Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ yes] Did you write any new necessary tests?

## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

 -->

---------

Co-authored-by: Bao Phan <baopg@inter-k.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant