We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- paddlepaddle-gpu: 0.0.0.post120 - paddlenlp: 2.8.0
我在用paddlenlp跑大模型lora微调训练,发现该任务在一些机器上能跑起来,在另外一些机器上跑不起来。 跑不起来的机器报错lllegal instruction (core dumped),系统显示 libphi.so 有报错。 对比两种机器后,发现只有使用的cpu不同。怀疑paddle某些算子不支持老CPU。 似乎出问题的机器上CPU都是:Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz 可以运行的机器上CPU都是: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz及以上版本
进入PaddleNLP-develop/PaddleNLP-develop/llm目录
运行命令 python3 -m paddle.distributed.launch --gpus "0,1,2,3" finetune_generation.py ./chatglm2/lora_argument.json
在sugon-gpu-4上,任务报错 lllegal instruction (core dumped)
在sugon-gpu-6上,任务正常运行
其中,sugon-gpu-4用的cpu是Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz,sugon-gpu-6用的cpu是Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz及以上版本。两台机都有4张v100,cuda版本为12.5 。
The text was updated successfully, but these errors were encountered:
lugimzzz
No branches or pull requests
软件环境
重复问题
错误描述
稳定复现步骤 & 代码
进入PaddleNLP-develop/PaddleNLP-develop/llm目录
运行命令 python3 -m paddle.distributed.launch --gpus "0,1,2,3" finetune_generation.py ./chatglm2/lora_argument.json
在sugon-gpu-4上,任务报错 lllegal instruction (core dumped)
在sugon-gpu-6上,任务正常运行
其中,sugon-gpu-4用的cpu是Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz,sugon-gpu-6用的cpu是Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz及以上版本。两台机都有4张v100,cuda版本为12.5 。
The text was updated successfully, but these errors were encountered: