Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more models to inference based on LoRA #3382

Merged
merged 27 commits into from
Mar 26, 2024

Conversation

jeejeelee
Copy link
Contributor

MOTIVATION

For certain models such as ChatGLM3 and BaiChuan-7B in their transformers versions, the QKVParallelLinear layers are already combined. Consequently, when integrating LoRA support for them, utilizing the existing QKVParallelLinearWithLora is not feasible. The dense_h_to_4h layer of ChatGLM-3 operates in a similar fashion. New linear implementations are required to fulfill the lora integration needs for these models. Therefore, I have completed the following implementation:

  • added a RefQKVParallelLinear identical to QKVParallelLinear to signify that QKV is already merged in their transformer versions. And based on RefQKVParallelLinear,implemented the corresponding LoRA layer ,named QKVParallelLinearWithLora .
  • added a RefMergedColumnParallelLinear identical to RefMergedColumnParallelLinear to signify that gated_up_proj is already merged in their transformer versions. And based on RefMergedColumnParallelLinear,implemented the corresponding LoRA layer ,named RefMergedColumnParallelLinearWithLora .
  • completed the integration of LoRA support for ChatGLM3

@Yard1 Yard1 self-requested a review March 13, 2024 18:36
@Yard1 Yard1 self-requested a review March 13, 2024 23:38
@jeejeelee
Copy link
Contributor Author

@Yard1 Thanks for your review, I had added these new dimensions in test_punica.py, and the test result as follow:

3312 passed, 1872 skipped, 10 warnings

@Yard1
Copy link
Collaborator

Yard1 commented Mar 14, 2024

Thanks - I am wondering if we can avoid adding the Ref layers. It should be possible to use the existing LoRA layers to represent already merged modules - one possibility is to change the logic based on the length of the value list in packed_modules_mapping. Alternatively we could add a new LoRA layer (but not a new "base" layer) - it should then be picked based on packed_modules_mapping.

We'll also need tests for the new LoRA layers/modifications to existing layers - using the existing tests as a reference should be enough.

@jeejeelee
Copy link
Contributor Author

@Yard1 Thanks for your review and invaluable advise.
I haved refactored code based on your suggestion. I'm currently perplexed about how to properly name the new LoRA layers, so I still named them as "RefxxxWithLora".
As for the test, I'd like to confirm with you the above code refactoring before proceeding further.

vllm/lora/layers.py Outdated Show resolved Hide resolved
vllm/lora/layers.py Outdated Show resolved Hide resolved
vllm/lora/layers.py Outdated Show resolved Hide resolved
vllm/lora/models.py Outdated Show resolved Hide resolved
@jeejeelee jeejeelee requested a review from Yard1 March 17, 2024 10:45
@yuanmeng1120
Copy link

您好,这个pr 改后,baichuan2 _13B _chat加lora 和不加的结果是一样的,都是回复我是百川大模型,但是lora 的模型是做了身份矫正的,回复的答案不对,感觉lora 没起作用。


WechatIMG1180

@jeejeelee
Copy link
Contributor Author

您好,这个pr 改后,baichuan2 _13B _chat加lora 和不加的结果是一样的,都是回复我是百川大模型,但是lora 的模型是做了身份矫正的,回复的答案不对,感觉lora 没起作用。

WechatIMG1180
感谢关注,

  1. 请问你修改了vllm的baichuan的网络结构了吗
  2. 有基于tansformers验证过lora吗

@jeejeelee
Copy link
Contributor Author

@yuanmeng1120 我已完成百川相关模型支持lora的相关代码书写并使用baichuan-7B测试通过,您可以再次尝试下

@jeejeelee
Copy link
Contributor Author

@Yard1 Can you take a little time to review this PR again, ths~

@yuanmeng1120
Copy link

@jeejeelee 您好,我刚开始是在BaichuanForCausalLM 和BaiChuanForCausalLM 的两个类中改的,没有在base里改动,只有 "gata_down_proj": [
# "gate_proj",
# "down_proj",
# ], 这里不一样, 因为我看baichuan结构是先down, 后up , 其他的一样,是下面问题:
lora/layers.py", line 595, in set_lora
lora_b_q = lora_b[0][:, self.q_proj_shard_size *
TypeError: list indices must be integers or slices, not tuple
然后又按照您提的pr 改了后还是下面问题:
lora/layers.py", line 595, in set_lora
lora_b_q = lora_b[0][:, self.q_proj_shard_size *
TypeError: list indices must be integers or slices, not tuple
我再调试下修改下结构试试哈,非常感谢

@yuanmeng1120 我已完成百川相关模型支持lora的相关代码书写并使用baichuan-7B测试通过,您可以再次尝试下

@yuanmeng1120 我已完成百川相关模型支持lora的相关代码书写并使用baichuan-7B测试通过,您可以再次尝试下

@jeejeelee
Copy link
Contributor Author

@yuanmeng1120 Hi,如果您是拉了我的分支,再重新编译即可,应该不需要再修改百川模型的代码了。如果只是修改vllm主分支的代码的话,可以参考本次PR的修改内容来进行相关的修改

@yuanmeng1120
Copy link

@yuanmeng1120 你好,如果你拉了我的分支,再重新编译即可,不需要再修改百川模型的代码了。如果只是修改vllm主分支的代码的话,可以参考本次PR的内容来进行相关的修改

好的,感谢,我之前没拉pr, 改的源码,我拉下pr 试下

@yuanmeng1120
Copy link

@yuanmeng1120 你好,如果你拉了我的分支,再重新编译即可,不需要再修改百川模型的代码了。如果只是修改vllm主分支的代码的话,可以参考本次PR的内容来进行相关的修改

@jeejeelee 您好,拉了您的pr代码,然后报了这个错误:

@yuanmeng1120
Copy link

image 加了这个后,编译了下,又报了下面问题,编译不通 ![image](https://github.com/vllm-project/vllm/assets/61146321/4cd11877-215a-47b4-9cd9-24b9c84b81d8) ![image](https://github.com/vllm-project/vllm/assets/61146321/6e1b4078-a6fb-4184-9c71-01621c4678ee) 3242不能被64整除,我运行的是:python -m vllm.entrypoints.openai.api_server 这个代码,

@jeejeelee
Copy link
Contributor Author

@yuanmeng1120 Hi,其实这个问题我在跑chatglm3时,按照TP=4的时候也会遇到,在punica的config.h添加的话,的确是因为不满足被64整除的问题导致编译期断言失败。若是你也跑的chatglm3的话,建议TP=1或者2再尝试下

@yuanmeng1120
Copy link

@yuanmeng1120 Hi,其实这个问题我在跑chatglm3时,按照TP=4的时候遇到,在punica的config.h添加的话,确实是因为不满足被64整除的问题导致编译期断言失败。如果你也跑的chatglm3的话,TP=1或者2再尝试下

@yuanmeng1120 你好,其实这个问题我在跑chatglm3时,按照TP=4的时候遇到,在punica的config.h添加的话,确实是因为不满足被64整除的问题导致编译期断言失败。如果你也跑的chatglm3的话,TP=1或者2再尝试下
@jeejeelee 可以了,确实是这个问题,非常感谢 , 我试了2 不行, 1可以的

@Yard1
Copy link
Collaborator

Yard1 commented Mar 20, 2024

@jeejeelee can you address the comments I left before? Thanks!

@jeejeelee
Copy link
Contributor Author

@Yard1 Thanks for your review, I have addressed all of your comments except for the following:

this is too brittle. how about we add a class method to the lora layers that would act as a condition to be called here? eg. can_replace_src_layer

I don't understand how to add a class method according to your comment. I hope you can provide me with more detailed tips.

thank you for your guidance

@jeejeelee
Copy link
Contributor Author

是否需要更改以下代码? if 32000 < self.base_layer.vocab_size > 33024: raise ValueError("When using LoRA, vocab size must be " "32000 >= vocab_size <= 33024")

就个人的使用经验,不对llm_head添加lora,所以也不需要修改

@jeejeelee
Copy link
Contributor Author

@Yard1 I apologize for bothering you. My goal is to merge this PR before the v0.34 release, if feasible. Your feedback on the above comment would be greatly appreciated.

Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good. Let's merge once CI passses.

@jeejeelee
Copy link
Contributor Author

@Yard1 Thank you for your great work.

@Yard1 Yard1 merged commit 8af890a into vllm-project:main Mar 26, 2024
32 checks passed
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 31, 2024
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants