Add qwen2 #2495

JustinLin610 · 2024-01-18T16:44:17Z

Recently, I have pushed the new codes of Qwen2 to Hugging Face Transformers, and thus I also would love to contribute the new model to vLLM as well.

In this PR, I have provided the implementation of Qwen2 model and add some notes on it.

simon-mo · 2024-01-18T18:49:25Z

Thank you for the contribution! Do you know where can I get the weights for Qwen/Qwen2-7B-beta? I cannot find it on Huggingface (there's only V1?).

JustinLin610 · 2024-01-19T04:36:52Z

Thank you for the contribution! Do you know where can I get the weights for Qwen/Qwen2-7B-beta? I cannot find it on Huggingface (there's only V1?).

Yes, it is not released yet. Would you mind joining our HF org to read our new models, which are temporarily private? https://huggingface.co/Qwen

BTW you can contact me through email (junyang.ljy@alibaba-inc.com) or I can join your slack channel for further discussion (my slack email: justinlin930319@gmail.com).

simon-mo · 2024-01-19T06:07:40Z

Thanks. Emailed.

esmeetu · 2024-01-19T10:13:32Z

Where is the difference between qwen2 and Llama2 architecture? It looks like the same. If it's right, does it better to extend LlamaForCasualLM like DeciLM.

JustinLin610 · 2024-01-19T12:45:16Z

Where is the difference between qwen2 and Llama2 architecture? It looks like the same. If it's right, does it better to extend LlamaForCasualLM like DeciLM.

The code for Qwen2 is adaptive to previous Qwen as well as the next generation Qwen2. In comparison with Llama or Mistral, we have qkv bias, and the mixture of sliding window attention and full attention, which is controlled by the argument max_window_layers

simon-mo · 2024-01-19T18:57:08Z

Thanks for access. I'm testing it with python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-1_8B-Chat-beta-temp but hitting

  File "/home/xmo/vllm/vllm/engine/async_llm_engine.py", line 548, in from_engine_args
    engine_configs = engine_args.create_engine_configs()
  File "/home/xmo/vllm/vllm/engine/arg_utils.py", line 218, in create_engine_configs
    model_config = ModelConfig(self.model, self.tokenizer,
  File "/home/xmo/vllm/vllm/config.py", line 101, in __init__
    self.hf_config = get_config(self.model, trust_remote_code, revision)
  File "/home/xmo/vllm/vllm/transformers_utils/config.py", line 23, in get_config
    config = AutoConfig.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1098, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 795, in __getitem__
    raise KeyError(key)
KeyError: 'qwen2'

Can you add a Qwen2 config similar to the QwenConfig and add it to the registry?

JustinLin610 · 2024-01-19T19:27:59Z

Thanks for access. I'm testing it with python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-1_8B-Chat-beta-temp but hitting

  File "/home/xmo/vllm/vllm/engine/async_llm_engine.py", line 548, in from_engine_args
    engine_configs = engine_args.create_engine_configs()
  File "/home/xmo/vllm/vllm/engine/arg_utils.py", line 218, in create_engine_configs
    model_config = ModelConfig(self.model, self.tokenizer,
  File "/home/xmo/vllm/vllm/config.py", line 101, in __init__
    self.hf_config = get_config(self.model, trust_remote_code, revision)
  File "/home/xmo/vllm/vllm/transformers_utils/config.py", line 23, in get_config
    config = AutoConfig.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1098, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 795, in __getitem__
    raise KeyError(key)
KeyError: 'qwen2'

Can you add a Qwen2 config similar to the QwenConfig and add it to the registry?

Did you git clone the latest transformers and install it by pip install -e .? I did successfully run the command. Essentially as it is merged into HF transformers and I have no need to write a config.

simon-mo · 2024-01-19T19:45:05Z

I saw the Huggingface PR. This is pretty tricky because we don't know when will Huggingface release their new version that includes Qwen2. Can you still include it vLLM so we can release it without people waiting for Huggingface nightly/latest release?

JustinLin610 · 2024-01-22T18:16:51Z

I saw the Huggingface PR. This is pretty tricky because we don't know when will Huggingface release their new version that includes Qwen2. Can you still include it vLLM so we can release it without people waiting for Huggingface nightly/latest release?

Hi, HF transformers just released their new version with Qwen2 included. You can give a try and see if it works for you.

simon-mo · 2024-01-22T19:23:49Z

(base) xmo@simon-devbox:~/vllm-work-trees/awq-doc/.buildkite$ curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2-1_8B-Chat-beta-temp",
        "prompt": "San Francisco is a",
        "max_tokens": 7
    }'


curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "Qwen/Qwen2-1_8B-Chat-beta-temp",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

{"id":"cmpl-808bbb1b24f04232b5421f28d4fefd54","object":"text_completion","created":350809,"model":"Qwen/Qwen2-1_8B-Chat-beta-temp","choices":[{"index":0,"text":" ______ to me, so I went","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":4,"total_tokens":11,"completion_tokens":7}}
{"id":"cmpl-24e3af97038d41d5a53f5e6783cb94f8","object":"chat.completion","created":350809,"model":"Qwen/Qwen2-1_8B-Chat-beta-temp","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! Is there anything specific you would like to test or ask me? I'm here to help with any questions or information you need.<|im_end|>\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"total_tokens":45,"completion_tokens":31}}

verified. pushing a commit for transformers model pinning and will merge.

JustinLin610 added 2 commits January 19, 2024 00:41

add qwen2

59baeb5

format

714e210

simon-mo added 2 commits January 22, 2024 19:17

Merge branch 'main' of github.com:vllm-project/vllm into add_qwen2

586bc6f

pin transformers

2835e8b

simon-mo approved these changes Jan 22, 2024

View reviewed changes

simon-mo merged commit 94b5ede into vllm-project:main Jan 22, 2024
16 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add qwen2 (vllm-project#2495)

10c2c70

Jason-CKY mentioned this pull request Feb 19, 2024

Qwen1.5/Qwen2 model additions huggingface/text-generation-inference#1575

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qwen2 #2495

Add qwen2 #2495

JustinLin610 commented Jan 18, 2024

simon-mo commented Jan 18, 2024

JustinLin610 commented Jan 19, 2024

simon-mo commented Jan 19, 2024

esmeetu commented Jan 19, 2024 •

edited

Loading

JustinLin610 commented Jan 19, 2024

simon-mo commented Jan 19, 2024

JustinLin610 commented Jan 19, 2024

simon-mo commented Jan 19, 2024

JustinLin610 commented Jan 22, 2024

simon-mo commented Jan 22, 2024

Add qwen2 #2495

Add qwen2 #2495

Conversation

JustinLin610 commented Jan 18, 2024

simon-mo commented Jan 18, 2024

JustinLin610 commented Jan 19, 2024

simon-mo commented Jan 19, 2024

esmeetu commented Jan 19, 2024 • edited Loading

JustinLin610 commented Jan 19, 2024

simon-mo commented Jan 19, 2024

JustinLin610 commented Jan 19, 2024

simon-mo commented Jan 19, 2024

JustinLin610 commented Jan 22, 2024

simon-mo commented Jan 22, 2024

esmeetu commented Jan 19, 2024 •

edited

Loading