Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 修改精调src_length报错 #9232

Open
1 task done
littlesmallrookie opened this issue Oct 9, 2024 · 3 comments
Open
1 task done

[Bug]: 修改精调src_length报错 #9232

littlesmallrookie opened this issue Oct 9, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@littlesmallrookie
Copy link

软件环境

- paddlepaddle:   
- paddlepaddle-gpu:  3.0.0b1
- paddlenlp: 3.0.0b1.post20241009

重复问题

  • I have searched the existing issues

错误描述

对 Qwen/Qwen2-0.5B 进行精调时,修改 lora_argument.json src_length=10240  进行训练报错
报错内容:
[2024-10-09 11:59:07,127] [   DEBUG] -   Number of trainable parameters = 3,784,704 (per device)
W1009 11:59:08.997602 31629 multiply_fwd_func.cc:75] got different data type, run type promotion automatically, this may cause data type been changed.
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/run_finetune.py", line 689, in <module>
    main()
  File "/home/aistudio/work/PaddleNLP/llm/run_finetune.py", line 564, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 799, in train
    return self._inner_training_loop(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 993, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 2122, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 2067, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/transformers/qwen2/modeling.py", line 1365, in forward
    loss = self.criterion(logits, labels)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/transformers/qwen2/modeling.py", line 1142, in forward
    loss = paddle.mean(masked_lm_loss)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/stat.py", line 90, in mean
    return _C_ops.mean(x, axis, keepdim)
ValueError: (InvalidArgument) Tensor need be reduced must not empty.
  [Hint: Expected x.numel() > 0, but received x.numel():0 <= 0:0.] (at ../paddle/phi/kernels/funcs/reduce_function.h:1055)

稳定复现步骤 & 代码

image

@littlesmallrookie littlesmallrookie added the bug Something isn't working label Oct 9, 2024
@ZHUI
Copy link
Collaborator

ZHUI commented Oct 9, 2024

好像是没有需要算loss的token,导致了报错。masked_lm_loss这个gather出来是空的。

@littlesmallrookie
Copy link
Author

好像是没有需要算loss的token,导致了报错。masked_lm_loss这个gather出来是空的。
请问如何修复?

@DrownFish19
Copy link
Collaborator

好像是没有需要算loss的token,导致了报错。masked_lm_loss这个gather出来是空的。

可以检查一下src_length和max_length,max_length应该大于src_length。max_length = src_length + output_length
所以降低src_length或者提高max_length应该可以解决这个问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants