[New Model]: Llama 3 8B Instruct

### The model to consider.

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

### The closest model vllm already supports.

LLama 1 & 2

### What's your difficulty of supporting the model you want?

LLama 3 instruct requires a different stop token than is specified in the `tokenizer.json` file. 
The `tokenizer.json` specifies `<|end_of_text|>` as the end of string token which works for the base LLama 3 model, but this is not the right token for the instruct tune. The instruct tune uses `<|eot_id|>`.

You can see this in the  inference code for the model on the [llama 3 8B instruct model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), where this token is added:

```python
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

# HERE is where they add the `<|eot_id|>` token, which is not the default end of string token, to the list of terminators.
terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
```

Here is a discussion of this topic in the `llama.cpp` repository: 
https://github.com/ggerganov/llama.cpp/pull/6751

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[New Model]: Llama 3 8B Instruct #4297

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[New Model]: Llama 3 8B Instruct #4297

Description

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions