Skip to content

[Usage]: Llama 3 8B Instruct Inference #4180

Closed
@aliozts

Description

@aliozts

Your current environment

Using the latest version of vLLM on 2 L4 GPUs.

How would you like to use vllm

I was trying to utilize vLLM to deploy meta-llama/Meta-Llama-3-8B-Instruct model and use OpenAI compatible server with the latest docker image. When I did, it was not stopping generation for a while when max_tokens=None. I saw that it's generating <|eot_id|> token which is its eos token apparently but in their tokenizer_config and in other configs it is <|end_of_text|>.

I can fix this by setting the eos_token parameter in tokenizer_config.json as <|eot_id|> or using

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[{"role": "user",
               "content": "Write a function for fibonacci sequence. Use LRUCache"}],
    max_tokens=700,
    stream=False,
    extra_body={"stop_token_ids":[128009]}
)

stop_token_ids in my request. I wanted to ask the optimal way to solve this problem.

There is an existing discussion/PR in their repo which is updating the generation_config.json but unless I clone myself, I saw that vLLM does not install the generation_config.json file. I also tried with this revision but it still was not stopping generating after <|eot_id|>. Moreover, I tried with this revision as well but it did not stop generating as well.

tldr; Llama-3-8B-Instruct model does not stop generation because of the eos token.

  • Updating generation_config.json does not work.
  • Updating config.json also does not work.
  • Updating tokenizer_config.json works but it overwrites the existing eos_token. Is this problematic or is there a more elegant way to solve this?

May I ask the optimal way to solve this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions