Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support chatglm fine grained dybatch v1. #6798

Merged

Conversation

xiaoxiaohehe001
Copy link
Contributor

PR types

New features

PR changes

Models

Description

Support chatglm fine grained dybatch v1.

@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Merging #6798 (2ffa396) into develop (1a69081) will decrease coverage by 0.24%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           develop    #6798      +/-   ##
===========================================
- Coverage    60.30%   60.06%   -0.24%     
===========================================
  Files          544      546       +2     
  Lines        80364    80680     +316     
===========================================
  Hits         48460    48460              
- Misses       31904    32220     +316     
Files Changed Coverage Δ
paddlenlp/experimental/transformers/__init__.py 0.00% <0.00%> (ø)
...enlp/experimental/transformers/chatglm/__init__.py 0.00% <0.00%> (ø)
...enlp/experimental/transformers/chatglm/modeling.py 0.00% <0.00%> (ø)
...erimental/transformers/fused_transformer_layers.py 0.00% <0.00%> (ø)
...enlp/experimental/transformers/generation_utils.py 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

@@ -181,7 +209,7 @@ def update_model_kwargs_for_generation(cache, just_decoder, next_tokens, eos_tok
model_kwargs["seq_len_decoder"],
model_kwargs["seq_len_decoder"] + 1,
)
return model_kwargs
return model_kwargs, next_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否需要返回next_tokens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 set_multi_stops 应该可以放到 sample 函数里面来处理吧,所以就可以不在这里返回 next_tokens。

尽量和paddlenlp 现有的函数输入和输出保持一致。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done~

self.tgt_generation_mask[i, 0, 0, :length] = paddle.ones(shape=[1, length], dtype="float16")

inputs["attention_mask"] = self.attention_mask
inputs["tgt_generation_mask"] = self.tgt_generation_mask

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里看着主要的diff是tgt_pos的处理,只有chatglm有这个2d position的区别么,这里看看把这块也封装一下,通过参数is_2d_pos之类的来区分,在代码里直接通过chatglm判断不是很易扩展
@wj-Mcat @carryyu 也看下怎么封装好一些

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

商量了下,先暂时这个样子,等后面针对于 chatglm 的tokenizer 再调整一下,此时这里的分支代码就可以删掉了。


config.tensor_parallel_degree = tensor_parallel_degree
config.tensor_parallel_rank = tensor_parallel_rank
model = LlamaForCausalLMInferenceModel.from_pretrained(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里看下能否走AutoModelForCausalLM那种方式,内部根据config去分发

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前还不能通过 AutoModelForCausalLM 来分发,所以初始化目前只能够 hardcode。

不过判断是哪种模型可以通过上面的 config.architectures 来判断。

llm/predictor.py Outdated
"你好",
"你好啊,请问你叫什么名字",
"你好啊,你在干什么",
# "My name is?"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释可以删一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -158,19 +163,42 @@ def update_model_kwargs_for_generation(cache, just_decoder, next_tokens, eos_tok
if cache is None:
# encoder's generation
model_kwargs["tgt_ids"] = paddle.where(just_decoder, model_kwargs["tgt_ids"], next_tokens)
model_kwargs["tgt_pos"] = paddle.where(just_decoder, model_kwargs["tgt_pos"], model_kwargs["tgt_pos"] + 1)
# import pdb;pdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 pdb 代码应该删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -181,7 +209,7 @@ def update_model_kwargs_for_generation(cache, just_decoder, next_tokens, eos_tok
model_kwargs["seq_len_decoder"],
model_kwargs["seq_len_decoder"] + 1,
)
return model_kwargs
return model_kwargs, next_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 set_multi_stops 应该可以放到 sample 函数里面来处理吧,所以就可以不在这里返回 next_tokens。

尽量和paddlenlp 现有的函数输入和输出保持一致。

llm/predictor.py Outdated Show resolved Hide resolved
llm/predictor.py Outdated Show resolved Hide resolved

config.tensor_parallel_degree = tensor_parallel_degree
config.tensor_parallel_rank = tensor_parallel_rank
model = LlamaForCausalLMInferenceModel.from_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前还不能通过 AutoModelForCausalLM 来分发,所以初始化目前只能够 hardcode。

不过判断是哪种模型可以通过上面的 config.architectures 来判断。

llm/predictor.py Outdated Show resolved Hide resolved
llm/predictor.py Outdated Show resolved Hide resolved
llm/utils.py Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ xiaoxiaohehe001
❌ zhengzekang


zhengzekang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

self.tgt_generation_mask[i, 0, 0, :length] = paddle.ones(shape=[1, length], dtype="float16")

inputs["attention_mask"] = self.attention_mask
inputs["tgt_generation_mask"] = self.tgt_generation_mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

商量了下,先暂时这个样子,等后面针对于 chatglm 的tokenizer 再调整一下,此时这里的分支代码就可以删掉了。

@@ -12,5 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from .chatglm import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个的 import 应该是要放到 from .fused_transformer_layers import * 下面的。

@@ -134,7 +139,7 @@ def generate(
return ret

@staticmethod
def update_model_kwargs_for_generation(cache, just_decoder, next_tokens, eos_token_id, model_kwargs):
def update_model_kwargs_for_generation(cache, just_decoder, next_tokens, eos_token_id, config, model_kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通过和 @xiaoxiaohehe001 商量之后,为了给模型足够的控制范围,决定将参数转化为实例函数(去掉 @staticmethod), 这样派生模型就可以通过 self.config 来获取到对应的配置,同时也可以重写对应函数。

Copy link
Contributor

@wj-Mcat wj-Mcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sijunhe sijunhe merged commit 7903bcc into PaddlePaddle:develop Aug 28, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants