Skip to content

The hf-internal-testing/llama-tokenizer do not support Chinese prompt #270

Closed
@sunyuhan19981208

Description

@sunyuhan19981208

I wonder whether the vllm support Chinese or other language, because I can successfully inference with English prompt, but when I use Chinese prompt, exception raised:

INFO 06-27 11:11:16 tokenizer_utils.py:30] Using the LLaMA fast tokenizer in 'hf-internal-testing/llama-tokenizer' to avoid potential protobuf errors.
INFO 06-27 11:13:57 llm_engine.py:128] # GPU blocks: 247, # CPU blocks: 327
Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/lustre/sunyuhan/./scripts/testvllm.py", line 38, in <module>
    outputs = llm.generate(prompts, sampling_params)
  File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 114, in generate
    return self._run_engine(use_tqdm)
  File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 134, in _run_engine
    step_outputs = self.llm_engine.step()
  File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 242, in step
    self._decode_sequences(seq_groups)
  File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 259, in _decode_sequences
    new_token, new_output_text = detokenize_incrementally(
  File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/vllm/engine/tokenizer_utils.py", line 68, in detokenize_incrementally
    output_text = tokenizer.convert_tokens_to_string(output_tokens)
  File "/mnt/cache/sunyuhan/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 536, in convert_tokens_to_string
    return self.backend_tokenizer.decoder.decode(tokens)
TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString'

And I also want to know how to solve this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions