Closed
Description
For trial to have streaming of response, below error occurs
File [~/miniconda/envs/fastapi/lib/python3.10/site-packages/llama_cpp/llama.py:482](https://file+.vscode-resource.vscode-cdn.net/Users/jinwhan/Documents/Coding/Solidity/Page/cloudRun/cloudrun-fastapi/app/~/miniconda/envs/fastapi/lib/python3.10/site-packages/llama_cpp/llama.py:482), in Llama._create_completion(self, prompt, suffix, max_tokens, temperature, top_p, logprobs, echo, stop, repeat_penalty, top_k, stream)
473 self._completion_bytes.append(text[start:])
474 ###
475 yield {
476 "id": completion_id,
477 "object": "text_completion",
478 "created": created,
479 "model": self.model_path,
480 "choices": [
481 {
--> 482 "text": text[start:].decode("utf-8"),
483 "index": 0,
...
488 }
490 if len(completion_tokens) >= max_tokens:
491 text = self.detokenize(completion_tokens)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 0: unexpected end of data
My code snippet was prepared as below referring Example.
from llama_cpp import Llama
import json
model_path = /my/model/path/for/ko_vicuna_7b/ggml-model-q4_0.bin"
prompt = "Tell me about Korea in english"
llm = Llama(model_path=model_path, n_ctx=4096, seed=0)
stream = llm(
f"Q: {prompt} \nA: ",
max_tokens=512,
stop=["Q:", "\n"],
stream=True,
temperature=0.1,
)
for output in stream:
print(output['choices'][0]["text"], end='')
Not only 0xec, but also 0xed, 0xf0 occurred for other trial cases. I cannot assure but it may be caused by language of model which is fine tuned for korean from vicuna 7b.
For your reference, several letters were given but it stops suddenly with above error.