Add support for MPT #334

WoosukKwon · 2023-07-03T01:41:17Z

Closes #218 and #332

Should be merged after #61

rmihaylov · 2023-07-03T13:45:00Z

Hi, great work! When do you think this PR will be merged?

emsi · 2023-07-03T19:45:18Z

@rmihaylov I have this and Bloom (#331) already merged if you want to give it a try:
https://github.com/emsi/vllm

emsi · 2023-07-03T20:47:27Z

@WoosukKwon awesome! Great thanks for this!

While playing with it I've stumbled upon strange behavior that might indicate that there is some issue when the beam search is used.
I've started the server with: python3 -m vllm.entrypoints.api_server --model mosaicml/mpt-30b

When I request:

curl http://vllm.ai/generate \
    -d '{
        "prompt": "San Francisco is a",
        "max_tokens":64,
        "temperature": 0,
        "n":1
    }'

I get more or less expected answer:

{"text": ["San Francisco is a city of neighborhoods, and each has its own character. The following is a brief description of the most popular areas.\n\n\u2022 **Downtown** (also called SoMa, for South of Market) is the city's financial district, with a few hotels, restaurants, and shops.\n\n\u2022 **Union Square"]}

However when I use beam_search:

curl http://vllm.ai/generate \
    -d '{
        "prompt": "San Francisco is a",
        "max_tokens":64,
        "use_beam_search": true,
        "temperature": 0,
        "n":4
    }'

I get:

{"text": [
  "San Francisco is a very city, visit each with its own personality and and The\n\n###  Fisherman's Wharf**Fisherman's Wharf is the city's is the  | ###  Top Sights  | ###  Sights  | ###  Eating  | ###  Drinking & Night",
   "San Francisco is a great place to live, but it's character and flavor. TheTheTheFisherman's Wharf\n\nThe city\u00bb  Fisherman's Wharf  is the city's most aSights  | ###  Activities  | ###  Courses  | ###  Festivals  |",
  "San Francisco is a big that for be a and and play distinct character.\n\n###  Neighborhoods at a Glance\n\n###  Name  | ###  Character\n\n---|---\n\nDowntown &\n\n---|---|---\n\nDowntown\n\n---|---|---|---Eating  | ###  Drinking & Nightlife",
 "San Francisco is a city of neighborhoods, and work, and play. charm. The\u2022\u2022 **Downtown in Brief & the Piers\n\nThe, San Francisco's most(Click here ) is theThe Marina  | The city'sEating  | ###  Drinking & Nightlife  | ###  Entertainment"
]}

I'm not sure but it looks like the answers are corrupted ot intermingled after certain number of tokens (like cumming from different answers?).

Interestingly enough the problem manifest only with n>2. I've tested for n=3, n=4 and n=5, for n=2 it looks correct:

{"text": [
"San Francisco is a great place to live, but it's not a great place to work. It's a city that's beautiful culture, its beautiful architecture, and its many attractions. Whether you're looking for a fun day trip or a longer stay, San Francisco has something for everyone. Here are some of the best things to do in", 
"San Francisco is a city of for be a and it is also a great place to visit. The city is known for its diverse neighborhoods, its unique architecture, and its beautiful natural setting. There are visiting to explore the city's many neighborhoods, or you're just looking for to offer everyone\nThe just a few of the many things to"
]}

WoosukKwon · 2023-07-03T20:53:24Z

@emsi Thanks for reporting it! Your beam search output looks very weird. We'll investigate it, but I believe if that is really a bug then the bug should be in our beam search logic, not in the MPT model. So we will investigate it in parallel with this PR.

zhuohan123

LGTM! Left some small comments.

vllm/transformers_utils/config.py

vllm/transformers_utils/configs/mpt.py

vllm/transformers_utils/config.py

…n wheel uploading (vllm-project#334) 1. Added/updated publish docker workflow into nightly/release workflow. 2. Fixed minor bugs in wheel uploading to GCP due to one wheel changes. 3. Removed duplicate upload code. --------- Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>

Change last decode bucket

@rjg-lyh

fix vllm-project/vllm-ascend#321 This pr is a temporary solution for long seq percision issue, will revert when the root cause is fixed cc @rjg-lyh @wangxiyuan Co-authored-by: rjg-lyh <1318825571@qq.com> --------- Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangxiyuan <wangxiyuan@huawei.com>

WoosukKwon added 16 commits July 2, 2023 03:23

Add BLOOM without ALiBi

218bbb5

Minor

a65960b

Add BLOOM to supported models

dc45776

Inheritance -> Composition

b2f1167

Add ALiBi bias to attention kernel

cca8695

Add PagedAttentionWithALiBi

b691559

Fix BLOOM

8aac4ed

[Minor] Fix comment

18d54e6

[Minor] single quote -> double quote

093bfcd

[Minor] Add more comments

047a0be

Add head size 112

2012b63

Add MPTConfig

6d22312

Add get_config

9918d82

AutoConfig -> get_config

7205ebd

Implement MPT

35831c4

Register MPT

3a30e17

WoosukKwon linked an issue Jul 3, 2023 that may be closed by this pull request

feature request: support mpt-30b #332

Closed

Add MPT to supported models

cff169c

WoosukKwon added 9 commits July 3, 2023 20:25

Merge branch 'main' into mpt

504cbc0

yapf

9ca9dc6

Remove init_config

d3704e0

single quote -> double quote

cbb47cb

Minor

1488f90

Minor

8046433

Minor

941ca40

Minor

c29f679

yapf

1573972

WoosukKwon requested a review from zhuohan123 July 3, 2023 20:46

WoosukKwon mentioned this pull request Jul 3, 2023

Weird beam search outputs #344

Closed

zhuohan123 approved these changes Jul 3, 2023

View reviewed changes

vllm/transformers_utils/config.py Show resolved Hide resolved

vllm/transformers_utils/configs/mpt.py Show resolved Hide resolved

zhuohan123 reviewed Jul 3, 2023

View reviewed changes

vllm/transformers_utils/config.py Show resolved Hide resolved

WoosukKwon added 3 commits July 3, 2023 16:24

Merge branch 'main' into mpt

6a7de44

yapf

42adfa5

Fix

e745990

WoosukKwon merged commit 404422f into main Jul 3, 2023

WoosukKwon deleted the mpt branch July 3, 2023 23:47

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Model] Add support for MPT (vllm-project#334)

58b4304

Xaenalt referenced this pull request in opendatahub-io/vllm Oct 14, 2024

Cnange last bucket for decode buckets (#334)

19764d0

Change last decode bucket

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add support for MPT #334

Add support for MPT #334

Uh oh!

WoosukKwon commented Jul 3, 2023

Uh oh!

rmihaylov commented Jul 3, 2023

Uh oh!

emsi commented Jul 3, 2023 •

edited

Loading

Uh oh!

emsi commented Jul 3, 2023

Uh oh!

WoosukKwon commented Jul 3, 2023

Uh oh!

zhuohan123 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add support for MPT #334

Add support for MPT #334

Uh oh!

Conversation

WoosukKwon commented Jul 3, 2023

Uh oh!

rmihaylov commented Jul 3, 2023

Uh oh!

emsi commented Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emsi commented Jul 3, 2023

Uh oh!

WoosukKwon commented Jul 3, 2023

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emsi commented Jul 3, 2023 •

edited

Loading