Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support eos_token_id from generation_config.json #4182

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

simon-mo
Copy link
Collaborator

Related to #4180

Some models uses eos_token_id field (Optional[Union[int, list[int]]) in generation_config.json
https://huggingface.co/docs/transformers/v4.39.3/en/main_classes/text_generation#transformers.GenerationConfig

This PR will load the config, get the value if user supplied, and inject it into stop_token_ids in sampling params. Notably this does not change the os_token_id in the sampling param or tokenizer config.

One example is DRBX. Meta Llama 3 might use generation config to reconcile the difference between <endoftext|> and <eot_id|>.

Testing

Because this model dependent, I have performed manual testing:

  1. Run Meta Llama 3 8B instruct, see the endofturn is not respected.
~$ curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you?"
      }
    ],
    "max_tokens": 256
  }'
{"id":"cmpl-ca00059831714382b0104ca1cb7e407d","object":"chat.completion","created":1713481143,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm your helpful assistant! I'm an AI designed to assist and support you in various ways. I can help with tasks, answer questions, provide information, and even engage in conversations. My purpose is to make your life easier and more efficient, so feel free to ask me anything or tell me what's on your mind! What can I help you with today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm happy to help with any questions or tasks you have.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm a large language model, trained on a massive dataset of text from the internet, books, and other sources. I can understand and respond to natural language input, and I'm constantly learning and improving my abilities.\n\nI can help with a wide range of tasks, such as:\n\n* Answering questions on various topics, from science and history to entertainment and culture\n* Generating text, such as articles, stories, or emails\n* Translating text from one language to another\n* Summarizing long pieces of text into shorter, more digestible versions\n* Offering suggestions and ideas for creative projects or problems you're trying to solve\n* Even just chatting with you and engaging in conversation!\n\nWhat do you need help with today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm excited to hear"},"logprobs":null,"finish_reason":"length","stop_reason":null}],"
  1. Change the field in generation config of the hf model
-   "eos_token_id": 128001,
+   "eos_token_id": [128001,128009],
  1. Same query
~$ curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you?"
      }
    ],
    "max_tokens": 256
  }'
{"id":"cmpl-bf80caf7d899446fa9e148d1714b0552","object":"chat.completion","created":1713481243,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm your helpful assistant! I'm an AI designed to assist and support you in various ways. I can help with tasks, answer questions, provide information, and even engage in conversations. My purpose is to make your life easier and more efficient, so feel free to ask me anything or tell me what's on your mind! What can I help you with today?"},"logprobs":null,"finish_

@simon-mo simon-mo mentioned this pull request Apr 18, 2024
9 tasks
@simon-mo simon-mo enabled auto-merge (squash) April 18, 2024 23:27
@simon-mo simon-mo merged commit a134ef6 into vllm-project:main Apr 19, 2024
46 checks passed
@premg16
Copy link

premg16 commented Apr 19, 2024

I am running vllm from docker image and facing the same issue what shall i do ?

@simon-mo
Copy link
Collaborator Author

For now you can add stop_token_ids as part of your request parameter, see #4180 (comment)

To go without this extra step, we need the model checkpoint's generation config to be updated, which is pending on HF side.

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 25, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 26, 2024
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024
alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants