-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable more models to inference based on LoRA #3382
Conversation
@Yard1 Thanks for your review, I had added these new dimensions in
|
Thanks - I am wondering if we can avoid adding the We'll also need tests for the new LoRA layers/modifications to existing layers - using the existing tests as a reference should be enough. |
@Yard1 Thanks for your review and invaluable advise. |
@yuanmeng1120 我已完成百川相关模型支持lora的相关代码书写并使用baichuan-7B测试通过,您可以再次尝试下 |
@Yard1 Can you take a little time to review this PR again, ths~ |
@jeejeelee 您好,我刚开始是在BaichuanForCausalLM 和BaiChuanForCausalLM 的两个类中改的,没有在base里改动,只有 "gata_down_proj": [
|
@yuanmeng1120 Hi,如果您是拉了我的分支,再重新编译即可,应该不需要再修改百川模型的代码了。如果只是修改vllm主分支的代码的话,可以参考本次PR的修改内容来进行相关的修改 |
好的,感谢,我之前没拉pr, 改的源码,我拉下pr 试下 |
@jeejeelee 您好,拉了您的pr代码,然后报了这个错误: |
@yuanmeng1120 Hi,其实这个问题我在跑chatglm3时,按照TP=4的时候也会遇到,在punica的config.h添加的话,的确是因为不满足被64整除的问题导致编译期断言失败。若是你也跑的chatglm3的话,建议TP=1或者2再尝试下 |
|
@jeejeelee can you address the comments I left before? Thanks! |
@Yard1 Thanks for your review, I have addressed all of your comments except for the following:
I don't understand how to add a class method according to your comment. I hope you can provide me with more detailed tips. thank you for your guidance |
就个人的使用经验,不对 |
@Yard1 I apologize for bothering you. My goal is to merge this PR before the v0.34 release, if feasible. Your feedback on the above comment would be greatly appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good. Let's merge once CI passses.
@Yard1 Thank you for your great work. |
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
MOTIVATION
For certain models such as ChatGLM3 and BaiChuan-7B in their
transformers
versions, theQKVParallelLinear
layers are already combined. Consequently, when integrating LoRA support for them, utilizing the existingQKVParallelLinearWithLora
is not feasible. Thedense_h_to_4h
layer of ChatGLM-3 operates in a similar fashion. Newlinear
implementations are required to fulfill the lora integration needs for these models. Therefore, I have completed the following implementation:RefQKVParallelLinear
identical toQKVParallelLinear
to signify that QKV is already merged in theirtransformer
versions. And based onRefQKVParallelLinear
,implemented the corresponding LoRA layer ,namedQKVParallelLinearWithLora
.RefMergedColumnParallelLinear
identical toRefMergedColumnParallelLinear
to signify thatgated_up_proj
is already merged in their transformer versions. And based onRefMergedColumnParallelLinear
,implemented the corresponding LoRA layer ,namedRefMergedColumnParallelLinearWithLora
.