Error occurs when editing Baichuan-13B #7

hiyouga · 2023-07-11T16:37:53Z

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

The text was updated successfully, but these errors were encountered:

hiyouga · 2023-07-13T13:46:44Z

It may be caused by the alibi position encoding of the current implementation of the Baichuan-13B model. The alibi position encoding does not accept the attention mask thus it is incompatible with left-padding. We are trying to fix it through re-implement the Baichuan-13B model.

hiyouga · 2023-07-16T10:09:56Z

This problem has been fixed, please replace the model file of Baichuan-13B with the updated version in [1] and rerun the editing script.

[1] https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/tests/modeling_baichuan.py

hiyouga added the solved This problem has been already solved. label Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error occurs when editing Baichuan-13B #7

Error occurs when editing Baichuan-13B #7

hiyouga commented Jul 11, 2023

hiyouga commented Jul 13, 2023

hiyouga commented Jul 16, 2023

Error occurs when editing Baichuan-13B #7

Error occurs when editing Baichuan-13B #7

Comments

hiyouga commented Jul 11, 2023

hiyouga commented Jul 13, 2023

hiyouga commented Jul 16, 2023