Skip to content

Add support for MPT #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jul 3, 2023
Merged

Add support for MPT #334

merged 29 commits into from
Jul 3, 2023

Conversation

WoosukKwon
Copy link
Collaborator

Closes #218 and #332

Should be merged after #61

@WoosukKwon WoosukKwon linked an issue Jul 3, 2023 that may be closed by this pull request
@rmihaylov
Copy link

Hi, great work! When do you think this PR will be merged?

@emsi
Copy link

emsi commented Jul 3, 2023

@rmihaylov I have this and Bloom (#331) already merged if you want to give it a try:
https://github.com/emsi/vllm

@WoosukKwon WoosukKwon requested a review from zhuohan123 July 3, 2023 20:46
@emsi
Copy link

emsi commented Jul 3, 2023

@WoosukKwon awesome! Great thanks for this!

While playing with it I've stumbled upon strange behavior that might indicate that there is some issue when the beam search is used.
I've started the server with: python3 -m vllm.entrypoints.api_server --model mosaicml/mpt-30b

When I request:

curl http://vllm.ai/generate \
    -d '{
        "prompt": "San Francisco is a",
        "max_tokens":64,
        "temperature": 0,
        "n":1
    }'

I get more or less expected answer:

{"text": ["San Francisco is a city of neighborhoods, and each has its own character. The following is a brief description of the most popular areas.\n\n\u2022 **Downtown** (also called SoMa, for South of Market) is the city's financial district, with a few hotels, restaurants, and shops.\n\n\u2022 **Union Square"]}

However when I use beam_search:

curl http://vllm.ai/generate \
    -d '{
        "prompt": "San Francisco is a",
        "max_tokens":64,
        "use_beam_search": true,
        "temperature": 0,
        "n":4
    }'

I get:

{"text": [
  "San Francisco is a very city, visit each with its own personality and and The\n\n###  Fisherman's Wharf**Fisherman's Wharf is the city's is the  | ###  Top Sights  | ###  Sights  | ###  Eating  | ###  Drinking & Night",
   "San Francisco is a great place to live, but it's character and flavor. TheTheTheFisherman's Wharf\n\nThe city\u00bb  Fisherman's Wharf  is the city's most aSights  | ###  Activities  | ###  Courses  | ###  Festivals  |",
  "San Francisco is a big that for be a and and play distinct character.\n\n###  Neighborhoods at a Glance\n\n###  Name  | ###  Character\n\n---|---\n\nDowntown &\n\n---|---|---\n\nDowntown\n\n---|---|---|---Eating  | ###  Drinking & Nightlife",
 "San Francisco is a city of neighborhoods, and work, and play. charm. The\u2022\u2022 **Downtown in Brief & the Piers\n\nThe, San Francisco's most(Click here ) is theThe Marina  | The city'sEating  | ###  Drinking & Nightlife  | ###  Entertainment"
]}

I'm not sure but it looks like the answers are corrupted ot intermingled after certain number of tokens (like cumming from different answers?).

Interestingly enough the problem manifest only with n>2. I've tested for n=3, n=4 and n=5, for n=2 it looks correct:

{"text": [
"San Francisco is a great place to live, but it's not a great place to work. It's a city that's beautiful culture, its beautiful architecture, and its many attractions. Whether you're looking for a fun day trip or a longer stay, San Francisco has something for everyone. Here are some of the best things to do in", 
"San Francisco is a city of for be a and it is also a great place to visit. The city is known for its diverse neighborhoods, its unique architecture, and its beautiful natural setting. There are visiting to explore the city's many neighborhoods, or you're just looking for to offer everyone\nThe just a few of the many things to"
]}

@WoosukKwon
Copy link
Collaborator Author

@emsi Thanks for reporting it! Your beam search output looks very weird. We'll investigate it, but I believe if that is really a bug then the bug should be in our beam search logic, not in the MPT model. So we will investigate it in parallel with this PR.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left some small comments.

@WoosukKwon WoosukKwon merged commit 404422f into main Jul 3, 2023
@WoosukKwon WoosukKwon deleted the mpt branch July 3, 2023 23:47
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
…n wheel uploading (vllm-project#334)

1. Added/updated publish docker workflow into nightly/release workflow.
2. Fixed minor bugs in wheel uploading to GCP due to one wheel changes.
3. Removed duplicate upload code.

---------

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Xaenalt referenced this pull request in opendatahub-io/vllm Oct 14, 2024
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
fix vllm-project/vllm-ascend#321
This pr is a temporary solution for long seq percision issue, will
revert when the root cause is fixed
cc @rjg-lyh @wangxiyuan 

Co-authored-by: rjg-lyh <1318825571@qq.com>

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangxiyuan <wangxiyuan@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: support mpt-30b Support for MPT-7B and MPT-30B
4 participants