Closed
Description
I wonder whether the vllm support Chinese or other language, because I can successfully inference with English prompt, but when I use Chinese prompt, exception raised:
INFO 06-27 11:11:16 tokenizer_utils.py:30] Using the LLaMA fast tokenizer in 'hf-internal-testing/llama-tokenizer' to avoid potential protobuf errors.
INFO 06-27 11:13:57 llm_engine.py:128] # GPU blocks: 247, # CPU blocks: 327
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "/mnt/lustre/sunyuhan/./scripts/testvllm.py", line 38, in <module>
outputs = llm.generate(prompts, sampling_params)
File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 114, in generate
return self._run_engine(use_tqdm)
File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 134, in _run_engine
step_outputs = self.llm_engine.step()
File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 242, in step
self._decode_sequences(seq_groups)
File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 259, in _decode_sequences
new_token, new_output_text = detokenize_incrementally(
File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/engine/tokenizer_utils.py", line 68, in detokenize_incrementally
output_text = tokenizer.convert_tokens_to_string(output_tokens)
File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 536, in convert_tokens_to_string
return self.backend_tokenizer.decoder.decode(tokens)
TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString'
And I also want to know how to solve this problem.
Metadata
Metadata
Assignees
Labels
No labels