[Question]: llm 精调 src_length如何修改 #9233

littlesmallrookie · 2024-10-09T06:53:23Z

请提出你的问题

修改 lora_argument.json 中 src_length=10240 后训练报错：

[2024-10-09 11:59:07,127] [   DEBUG] -   Number of trainable parameters = 3,784,704 (per device)
W1009 11:59:08.997602 31629 multiply_fwd_func.cc:75] got different data type, run type promotion automatically, this may cause data type been changed.
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/run_finetune.py", line 689, in <module>
    main()
  File "/home/aistudio/work/PaddleNLP/llm/run_finetune.py", line 564, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 799, in train
    return self._inner_training_loop(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 993, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 2122, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/trainer/trainer.py", line 2067, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/transformers/qwen2/modeling.py", line 1365, in forward
    loss = self.criterion(logits, labels)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp-3.0.0b1.post20241009-py3.10.egg/paddlenlp/transformers/qwen2/modeling.py", line 1142, in forward
    loss = paddle.mean(masked_lm_loss)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/stat.py", line 90, in mean
    return _C_ops.mean(x, axis, keepdim)
ValueError: (InvalidArgument) Tensor need be reduced must not empty.
  [Hint: Expected x.numel() > 0, but received x.numel():0 <= 0:0.] (at ../paddle/phi/kernels/funcs/reduce_function.h:1055)

ZHUI · 2024-10-09T07:04:07Z

好像是没有需要算loss的token，导致了报错。masked_lm_loss这个gather出来是空的。

littlesmallrookie · 2024-10-09T07:44:02Z

如何修复？

DrownFish19 · 2024-10-15T09:38:01Z

已在#9232 回复，此处关闭issue，如果需要可重新打开。

littlesmallrookie added the question Further information is requested label Oct 9, 2024

paddle-bot bot assigned DesmonDay Oct 9, 2024

DrownFish19 closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: llm 精调 src_length如何修改 #9233

[Question]: llm 精调 src_length如何修改 #9233

littlesmallrookie commented Oct 9, 2024 •

edited by ZHUI

Loading

ZHUI commented Oct 9, 2024

littlesmallrookie commented Oct 9, 2024

DrownFish19 commented Oct 15, 2024 •

edited

Loading

[Question]: llm 精调 src_length如何修改 #9233

[Question]: llm 精调 src_length如何修改 #9233

Comments

littlesmallrookie commented Oct 9, 2024 • edited by ZHUI Loading

请提出你的问题

ZHUI commented Oct 9, 2024

littlesmallrookie commented Oct 9, 2024

DrownFish19 commented Oct 15, 2024 • edited Loading

littlesmallrookie commented Oct 9, 2024 •

edited by ZHUI

Loading

DrownFish19 commented Oct 15, 2024 •

edited

Loading