Closed
Description
System Info
Cargo version: 1.70.0
Commit sha: 211b54a
Docker label: N/A
nvidia-smi: Driver Version: 525.105.17 CUDA Version: 12.0
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
- launch tgi:
text-generation-launcher --model-id OpenAssistant/llama2-70b-oasst-sft-v10 -p 8080 --quantize bitsandbytes-nf4
- when ready, query with:
curl localhost:8080/generate -X POST -d $'{\"inputs\":\"user\\nHello assistant!<|im_end|>\\n<|im_start|>assistant\\n\",\"parameters\":{\"max_new_tokens\":10, \"do_sample\": true, \"decoder_input_details\": true}}' -H 'Content-Type: application/json'
Problem: The end of the input text sequence <|im_start|>assistant\n
is incorrectly tokenized into
{"id":32005,"text":"<|im_start|>"},
{"id":0,"text":"<unk>"},
{"id":13,"text":"<0x0A>"}
Full returned ouput is (should not start with "assistant", formatted for clarity):
{
"generated_text": "assistant\nHello! How can I help you",
"details": {
"finish_reason": "length",
"generated_tokens": 10,
"seed": 1809714102295104059,
"prefill": [
{
"id": 32005,
"text": "<|im_start|>",
"logprob": null
},
{
"id": 1404,
"text": "user",
"logprob": -9.8984375
},
{
"id": 13,
"text": "<0x0A>",
"logprob": -3.8105469
},
{
"id": 10994,
"text": "Hello",
"logprob": -13.703125
},
{
"id": 20255,
"text": "assistant",
"logprob": -4.8320312
},
{
"id": 29991,
"text": "!",
"logprob": -1.2890625
},
{
"id": 32006,
"text": "<|im_end|>",
"logprob": -6.9804688
},
{
"id": 32005,
"text": "<|im_start|>",
"logprob": -11.6171875
},
{
"id": 0,
"text": "<unk>",
"logprob": -26.5
},
{
"id": 13,
"text": "<0x0A>",
"logprob": 0.0
}
],
"tokens": [
{
"id": 32005,
"text": " <|im_start|>",
"logprob": 0.0,
"special": true
},
{
"id": 20255,
"text": " assistant",
"logprob": 0.0,
"special": false
},
{
"id": 13,
"text": "\n",
"logprob": 0.0,
"special": false
},
{
"id": 10994,
"text": "Hello",
"logprob": -0.5229492,
"special": false
},
{
"id": 29991,
"text": "!",
"logprob": -0.23022461,
"special": false
},
{
"id": 1128,
"text": " How",
"logprob": -0.25,
"special": false
},
{
"id": 508,
"text": " can",
"logprob": -0.09472656,
"special": false
},
{
"id": 306,
"text": " I",
"logprob": -0.0018663406,
"special": false
},
{
"id": 1371,
"text": " help",
"logprob": -0.4296875,
"special": false
},
{
"id": 366,
"text": " you",
"logprob": -0.042816162,
"special": false
}
]
}
}
Expected behavior
The input text <|im_start|>assistant\n
should be tokenized and processed as:
{"id":32005,"text":"<|im_start|>"},
{"id":20255,"text":"assistant"},
{"id":13,"text":"<0x0A>"}
Metadata
Assignees
Labels
No labels