Fix client picking embeddings model by default for chat completion by vladimirivic · Pull Request #66 · llamastack/llama-stack-client-python

vladimirivic · 2024-12-19T22:35:00Z

Summary:
After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below:

llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --message "hello, what model are you?"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to inference chat-completion                                                                                                          │
│                                                                                                                                              │
│ Error Type: BadRequestError                                                                                                                  │
│ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Test Plan:
Run manually from the source

# Make sure server is started first then run this
python3 -m lib.cli.llama_stack_client models list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘

# Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore

python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including
books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT
(Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language
translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!",
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)

Summary: After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below: ``` llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ all-MiniLM-L6-v2 │ sentence-transformers │ all-MiniLM-L6-v2 │ {'embedding_dimension': 384.0} │ │ meta-llama/Llama-3.2-3B-Instruct │ ollama │ llama3.2:3b-instruct-fp16 │ {} │ └──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘ llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \ inference chat-completion \ --message "hello, what model are you?" ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to inference chat-completion │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` Test Plan: Run manually from the source ``` # Make sure server is started first then run this python3 -m lib.cli.llama_stack_client models list ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ all-MiniLM-L6-v2 │ sentence-transformers │ all-MiniLM-L6-v2 │ {'embedding_dimension': 384.0} │ │ meta-llama/Llama-3.2-3B-Instruct │ ollama │ llama3.2:3b-instruct-fp16 │ {} │ └──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘ # Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?" ChatCompletionResponse( completion_message=CompletionMessage( content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT (Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!", role='assistant', stop_reason='end_of_turn', tool_calls=[] ), logprobs=None ) ```

raghotham · 2024-12-19T23:08:21Z

src/llama_stack_client/lib/cli/inference/inference.py


    if not model_id:
-        available_models = [model.identifier for model in client.models.list()]
+        available_models = [model.identifier for model in client.models.list() if model.model_type == "llm"]


dont we have enums for these model types? can we use them instead of string literals?

vladimirivic requested review from ashwinb, dineshyv, dltn, hardikjshah, raghotham and yanxi0830 as code owners December 19, 2024 22:35

facebook-github-bot added the cla signed label Dec 19, 2024

dineshyv approved these changes Dec 19, 2024

View reviewed changes

vladimirivic merged commit 3077093 into main Dec 19, 2024
3 checks passed

vladimirivic deleted the pr66 branch December 19, 2024 23:07

raghotham reviewed Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix client picking embeddings model by default for chat completion#66

Fix client picking embeddings model by default for chat completion#66
vladimirivic merged 1 commit intomainfrom
pr66

vladimirivic commented Dec 19, 2024

Uh oh!

Uh oh!

raghotham Dec 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vladimirivic commented Dec 19, 2024

Uh oh!

Uh oh!

raghotham Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants