Skip to content

Fix client picking embeddings model by default for chat completion#66

Merged
vladimirivic merged 1 commit intomainfrom
pr66
Dec 19, 2024
Merged

Fix client picking embeddings model by default for chat completion#66
vladimirivic merged 1 commit intomainfrom
pr66

Conversation

@vladimirivic
Copy link
Contributor

Summary:
After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below:

llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --message "hello, what model are you?"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to inference chat-completion                                                                                                          │
│                                                                                                                                              │
│ Error Type: BadRequestError                                                                                                                  │
│ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Test Plan:
Run manually from the source

# Make sure server is started first then run this
python3 -m lib.cli.llama_stack_client models list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘

# Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore

python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including
books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT
(Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language
translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!",
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)

Summary:
After we added embeddings, the default model selection in the client may pick embeddings model and return an error. See the example below:

```
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --message "hello, what model are you?"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to inference chat-completion                                                                                                          │
│                                                                                                                                              │
│ Error Type: BadRequestError                                                                                                                  │
│ Details: Error code: 400 - {'detail': "Invalid value: Model 'all-MiniLM-L6-v2' is an embedding model and does not support chat completions"} │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

Test Plan:
Run manually from the source

```
# Make sure server is started first then run this
python3 -m lib.cli.llama_stack_client models list

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_id           ┃ provider_resource_id      ┃ metadata                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ all-MiniLM-L6-v2                 │ sentence-transformers │ all-MiniLM-L6-v2          │ {'embedding_dimension': 384.0} │
│ meta-llama/Llama-3.2-3B-Instruct │ ollama                │ llama3.2:3b-instruct-fp16 │ {}                             │
└──────────────────────────────────┴───────────────────────┴───────────────────────────┴────────────────────────────────┘

# Ok, all-MiniLM-L6-v2 is listed first, now send a request to make sure we do not see the error anymore

python3 -m lib.cli.llama_stack_client inference chat-completion --message "hello, what model are you?"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI assistant, specifically a language model based on the transformer architecture. I was trained on a massive dataset of text from various sources, including
books, articles, and conversations, which enables me to understand and generate human-like language.\n\nMy specific model is a type of transformer-based language model called BERT
(Bidirectional Encoder Representations from Transformers), which is a state-of-the-art model for natural language processing tasks such as question-answering, text classification, and language
translation.\n\nI'm designed to be helpful and informative, so feel free to ask me any questions or have a conversation with me on any topic you'd like!",
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)
```
@vladimirivic vladimirivic merged commit 3077093 into main Dec 19, 2024
3 checks passed
@vladimirivic vladimirivic deleted the pr66 branch December 19, 2024 23:07

if not model_id:
available_models = [model.identifier for model in client.models.list()]
available_models = [model.identifier for model in client.models.list() if model.model_type == "llm"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have enums for these model types? can we use them instead of string literals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants