Closed
Description
Name and Version
When using function calling, recent versions of llama-cpp refer to a character named Mistral in Tekken 7, which seems to come from the string mistral-v7-tekken
in one of the prompts.
The bug is present at least in versions 5749 and 5756, but not in 5311.
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA 4090, AMD 9950X.
Models
bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q4_K_M
unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL
Problem description & steps to reproduce
When using function calling, recent versions of llama-cpp refer to a character named Mistral in Tekken 7, which seems to come from the string mistral-v7-tekken
in one of the prompts
The bug is present at least in versions 5749 and 5756, but not in 5311.
How to reproduce
llama-server --log-disable --host 0.0.0.0 --port 8080 \
-hf bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q4_K_M --jinja \
--flash-attn -c 40960 --temp 0.15 --top-k -1 --top-p 1.00 -ngl 99`
curl http://forge.vpn:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [
{
"type":"function",
"function":{
"name":"python",
"description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
"parameters":{
"type":"object",
"properties":{
"code":{
"type":"string",
"description":"The code to run in the ipython interpreter."
}
},
"required":["code"]
}
}
}
],
"messages": [
{
"role": "user",
"content": "Print a hello world message with python using a tool."
}
]
}'
Output obtained:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "In the Tekken series, Mistral is a character who first appeared in Tekken 7. She is ...."
}
}
],
Expected output
Using release 5311.
{
"choices": [
{
"finish_reason": "tool_calls",
"index": 0,
"message": {
"role": "assistant",
"content": "Sure, I can help with that. I'll use the tool to run a Python code that prints a \"Hello, World!\" message. Here's the code I'll execute:",
"tool_calls": [
{
"type": "function",
"function": {
"name": "python",
"arguments": "{\"code\":\"print(\\\"Hello, World!\\\")\\n \"}"
},
"id": "a1a2b3b4c"
}
]
}
}
First Bad Commit
No response
Relevant log output
curl http://forge.vpn:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [
{
"type":"function",
"function":{
"name":"python",
"description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
"parameters":{
"type":"object",
"properties":{
"code":{
"type":"string",
"description":"The code to run in the ipython interpreter."
}
},
"required":["code"]
}
}
}
],
"messages": [
{
"role": "user",
"content": "Print a hello world message with python using a tool."
}
]
}' | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1924 100 1200 100 724 485 292 0:00:02 0:00:02 --:--:-- 777
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "In Tekken 7, Mistral is a character known for her agility and fluid fighting style. She is a skilled martial artist who combines traditional techniques with acrobatic movements. Mistral's moveset includes a variety of kicks and spins, making her a versatile and unpredictable opponent. Her signature moves often involve rapid footwork and aerial attacks, allowing her to control the pace of the fight. Players who master Mistral can use her mobility to outmaneuver opponents and land precise strikes. Her playstyle is well-suited for those who enjoy fast-paced, dynamic combat. Would you like to know more about her specific moves or strategies?"
}
}
],
"created": 1750896479,
"model": "gpt-3.5-turbo",
"system_fingerprint": "b0-unknown",
"object": "chat.completion",
"usage": {
"completion_tokens": 138,
"prompt_tokens": 8,
"total_tokens": 146
},
"id": "chatcmpl-V635AoxdJwPPC0Oxi3qLfaokGLc0Lqs3",
"timings": {
"prompt_n": 8,
"prompt_ms": 36.716,
"prompt_per_token_ms": 4.5895,
"prompt_per_second": 217.8886588953045,
"predicted_n": 138,
"predicted_ms": 2331.254,
"predicted_per_token_ms": 16.89314492753623,
"predicted_per_second": 59.19560888689092
}
}
---
llama-server --log-disable --host 0.0.0.0 --port 8080 -hf bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q4_K_M --jinja --flash-attn -c 40960 --temp 0.15 --top-k -1 --top-p 1.00 -ngl 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Failed to infer a tool call example (possible template bug)
clip_model_loader: model name: Mistral Small 3.2 24B Instruct 2506
clip_model_loader: description:
clip_model_loader: GGUF version: 3
clip_model_loader: alignment: 32
clip_model_loader: n_tensors: 223
clip_model_loader: n_kv: 31
clip_model_loader: has vision encoder
clip_ctx: CLIP using CUDA0 backend
load_hparams: projector: pixtral
load_hparams: n_embd: 1024
load_hparams: n_head: 16
load_hparams: n_ff: 4096
load_hparams: n_layer: 24
load_hparams: ffn_op: silu
load_hparams: projection_dim: 5120
--- vision hparams ---
load_hparams: image_size: 1024
load_hparams: patch_size: 14
load_hparams: has_llava_proj: 0
load_hparams: minicpmv_version: 0
load_hparams: proj_scale_factor: 0
load_hparams: n_wa_pattern: 0
load_hparams: model size: 837.36 MiB
load_hparams: metadata size: 0.08 MiB
alloc_compute_meta: CUDA0 compute buffer size = 2.97 MiB
alloc_compute_meta: CPU compute buffer size = 0.14 MiB