[Usage]: Llama 3 8B Instruct Inference

### Your current environment

Using the latest version of vLLM on 2 L4 GPUs.

### How would you like to use vllm

I was trying to utilize vLLM to deploy [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model and use OpenAI compatible server with the latest docker image. When I did, it was not stopping generation for a while when `max_tokens=None`. I saw that it's generating `<|eot_id|>` token which is its eos token apparently but in their `tokenizer_config` and in other configs it is `<|end_of_text|>`. 

I can fix this by setting the `eos_token` parameter in `tokenizer_config.json` as `<|eot_id|>` or using

```python
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[{"role": "user",
               "content": "Write a function for fibonacci sequence. Use LRUCache"}],
    max_tokens=700,
    stream=False,
    extra_body={"stop_token_ids":[128009]}
)
```

`stop_token_ids` in my request. I wanted to ask the optimal way to solve this problem.

There is an existing [discussion/PR](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/4)  in their repo which is updating the `generation_config.json` but unless I clone myself, I saw that vLLM does not install the `generation_config.json` file. I also tried with this `revision` but it still was not stopping generating after `<|eot_id|>`. Moreover, I tried with this `revision` as well but it did not stop generating as well.

tldr; [Llama-3-8B-Instruct](meta-llama/Meta-Llama-3-8B-Instruct) model does not stop generation because of the `eos token`. 

 - Updating `generation_config.json` does not work.
 - Updating `config.json` also does not work.
 -  Updating `tokenizer_config.json` works but it overwrites the existing `eos_token`. Is this problematic or is there a more elegant way to solve this?

May I ask the optimal way to solve this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Llama 3 8B Instruct Inference #4180

Your current environment

How would you like to use vllm

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Llama 3 8B Instruct Inference #4180

Description

Your current environment

How would you like to use vllm

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions