Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTranslate2 #211

Closed
Matthieu-Tinycoaching opened this issue Jun 22, 2023 · 8 comments
Closed

CTranslate2 #211

Matthieu-Tinycoaching opened this issue Jun 22, 2023 · 8 comments

Comments

@Matthieu-Tinycoaching
Copy link

Hello,

Thanks for the great framework for deploying LLM.

Would it be possible to use a LLM model compiled with the CTranslate2 library?

@zhuohan123
Copy link
Member

Thanks for bringing this up. We will investigate the CTranslate2 library and evaluate the difficulty and the potential benefit of adding it into vLLM.

@anujnayyar1
Copy link

anujnayyar1 commented Jun 24, 2023

Would love to see this, ct2 would be a great integration! It would give us easy access to fast 8 bit inference and plays nice with HF Transformers. Thank you for the library so far!!

@Matthieu-Tinycoaching
Copy link
Author

Hi,

Any news regarding this integration? Ctranslate2 has already proven its speed within the TitanML framework for local LLM serving.

@manishiitg
Copy link

hi,

any news on this?

@Matthieu-Tinycoaching
Copy link
Author

+1

@shixianc
Copy link

+11

@hmellor
Copy link
Collaborator

hmellor commented May 18, 2024

@zhuohan123 do you see any benefit of adding this to vLLM?

yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024
Upstream sync 2024 04 26
(neuralmagic#211)

SUMMARY:
Merge commits from
vllm-project@a37d815
to
vllm-project@b6dcb4d

Note that
vllm-project@a37d815
is NOT included in this merge.

---------

Signed-off-by: Tao He <sighingnow@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Bellk17 <Kyletbell@ymail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jee Li <pandaleefree@163.com>
Co-authored-by: Dylan Hawk <51147702+dylanwhawk@users.noreply.github.com>
Co-authored-by: zspo <songpo.zhang@foxmail.com>
Co-authored-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Ricky Xu <xuchen727@hotmail.com>
Co-authored-by: Noam Gat <noamgat@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Elinx <xizzuli@163.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Shoichi Uchinami <s.uchinami@gmail.com>
Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Michał Moskal <michal@moskal.me>
Co-authored-by: James Whedbee <jamesw@telnyx.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: Adam Tilghman <agt@ucsd.edu>
Co-authored-by: Uranus <109661872+UranusSeven@users.noreply.github.com>
Co-authored-by: Zhong Wang <wangzhong@infini-ai.com>
Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
Co-authored-by: Ayush Rautwar <42046470+ayusher@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-13-147.ec2.internal>
Co-authored-by: Harry Mellor <hmellor@oxts.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: nunjunj <106306814+nunjunj@users.noreply.github.com>
Co-authored-by: xiaoji <44150358+YeFD@users.noreply.github.com>
Co-authored-by: GeauxEric <yunding.eric@gmail.com>
Co-authored-by: Yun Ding <yunding@nvidia.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com>
Co-authored-by: Tao He <sighingnow@gmail.com>
Co-authored-by: alexm-nm <59768536+alexm-nm@users.noreply.github.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Jack Gordley <jgordley99@gmail.com>
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com>
Co-authored-by: James Fleming <jaemz@alum.mit.edu>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: zifeitong <zifeitong@gmail.com>
Co-authored-by: Caio Mendes <caioctmendes@gmail.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: Caio Mendes <caiocesart@microsoft.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
@hmellor
Copy link
Collaborator

hmellor commented Sep 20, 2024

Closing because if there was significant benefit, it would have been discussed more or even implemented by now.

@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024
prarit referenced this issue in prarit/vllm Oct 21, 2024
* extend moe padding to DUMMY weights
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants