Enable more models to inference based on LoRA#3382
Conversation
|
@Yard1 Thanks for your review, I had added these new dimensions in |
|
Thanks - I am wondering if we can avoid adding the We'll also need tests for the new LoRA layers/modifications to existing layers - using the existing tests as a reference should be enough. |
|
@Yard1 Thanks for your review and invaluable advise. |
|
@yuanmeng1120 我已完成百川相关模型支持lora的相关代码书写并使用baichuan-7B测试通过,您可以再次尝试下 |
|
@Yard1 Can you take a little time to review this PR again, ths~ |
|
@jeejeelee 您好,我刚开始是在BaichuanForCausalLM 和BaiChuanForCausalLM 的两个类中改的,没有在base里改动,只有 "gata_down_proj": [
|
|
@yuanmeng1120 Hi,如果您是拉了我的分支,再重新编译即可,应该不需要再修改百川模型的代码了。如果只是修改vllm主分支的代码的话,可以参考本次PR的修改内容来进行相关的修改 |
好的,感谢,我之前没拉pr, 改的源码,我拉下pr 试下 |
@jeejeelee 您好,拉了您的pr代码,然后报了这个错误: |
|
@yuanmeng1120 Hi,其实这个问题我在跑chatglm3时,按照TP=4的时候也会遇到,在punica的config.h添加的话,的确是因为不满足被64整除的问题导致编译期断言失败。若是你也跑的chatglm3的话,建议TP=1或者2再尝试下 |
|
|
@jeejeelee can you address the comments I left before? Thanks! |
|
@Yard1 Thanks for your review, I have addressed all of your comments except for the following: I don't understand how to add a class method according to your comment. I hope you can provide me with more detailed tips. thank you for your guidance |
就个人的使用经验,不对 |
|
@Yard1 I apologize for bothering you. My goal is to merge this PR before the v0.34 release, if feasible. Your feedback on the above comment would be greatly appreciated. |
Yard1
left a comment
There was a problem hiding this comment.
Thanks, looks good. Let's merge once CI passses.
|
@Yard1 Thank you for your great work. |
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>


MOTIVATION
For certain models such as ChatGLM3 and BaiChuan-7B in their
transformersversions, theQKVParallelLinearlayers are already combined. Consequently, when integrating LoRA support for them, utilizing the existingQKVParallelLinearWithLorais not feasible. Thedense_h_to_4hlayer of ChatGLM-3 operates in a similar fashion. Newlinearimplementations are required to fulfill the lora integration needs for these models. Therefore, I have completed the following implementation:RefQKVParallelLinearidentical toQKVParallelLinearto signify that QKV is already merged in theirtransformerversions. And based onRefQKVParallelLinear,implemented the corresponding LoRA layer ,namedQKVParallelLinearWithLora.RefMergedColumnParallelLinearidentical toRefMergedColumnParallelLinearto signify thatgated_up_projis already merged in their transformer versions. And based onRefMergedColumnParallelLinear,implemented the corresponding LoRA layer ,namedRefMergedColumnParallelLinearWithLora.