Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: llm 精调 src_length如何修改 #9233

Closed
littlesmallrookie opened this issue Oct 9, 2024 · 3 comments
Closed

[Question]: llm 精调 src_length如何修改 #9233

littlesmallrookie opened this issue Oct 9, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@littlesmallrookie
Copy link

littlesmallrookie commented Oct 9, 2024

请提出你的问题

修改 lora_argument.json 中 src_length=10240 后训练报错:

[2024-10-09 11:59:07,127] [   DEBUG] -   Number of trainable parameters = 3,784,704 (per device)
W1009 11:59:08.997602 31629 multiply_fwd_func.cc:75] got different data type, run type promotion automatically, this may cause data type been changed.
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/run_finetune.py", line 689, in <module>
    main()
  File "/home/aistudio/work/PaddleNLP/llm/run_finetune.py", line 564, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 799, in train
    return self._inner_training_loop(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 993, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 2122, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 2067, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/transformers/qwen2/modeling.py", line 1365, in forward
    loss = self.criterion(logits, labels)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/transformers/qwen2/modeling.py", line 1142, in forward
    loss = paddle.mean(masked_lm_loss)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/stat.py", line 90, in mean
    return _C_ops.mean(x, axis, keepdim)
ValueError: (InvalidArgument) Tensor need be reduced must not empty.
  [Hint: Expected x.numel() > 0, but received x.numel():0 <= 0:0.] (at ../paddle/phi/kernels/funcs/reduce_function.h:1055)
@littlesmallrookie littlesmallrookie added the question Further information is requested label Oct 9, 2024
@ZHUI
Copy link
Collaborator

ZHUI commented Oct 9, 2024

好像是没有需要算loss的token,导致了报错。masked_lm_loss这个gather出来是空的。

@littlesmallrookie
Copy link
Author

如何修复?

@DrownFish19
Copy link
Collaborator

DrownFish19 commented Oct 15, 2024

已在#9232 回复,此处关闭issue,如果需要可重新打开。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants