-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #70
Comments
could you please provide more log? I think there should be another error before this. |
Hi I also get the same error. The log is as follows: (lmflow) xuyan@black-rack-0:~/LLM/LMFlow$ CUDA_VISIBLE_DEVICES=0 ./scripts/run_finetune.sh "--num_gpus=1 --master_port 10001" The above exception was the direct cause of the following exception: Traceback (most recent call last): |
It's better to using the same CUDA version with pytorch, like this: conda install cuda -c nvidia/label/cuda-11.7.0 |
(lmflow) u20@u20:~/LMFlow/service$ nvcc --version cuda 11.6 not work? |
I found it's always hard to debug CUDA version related issues... It works fine on my machine using conda to install 11.7 version CUDA. |
Yes you are right. Thank you very much for your help! |
According to the log, it is indeed due to the CUDA version problem. It seems |
yes, I have the same error. And I installed cuda -c nvidia/label/cuda-11.7.0.
|
I am using But still getting the error |
This solution works for me! Thank you very much for the help! <3 |
This issue has been marked as stale because it has not had recent activity. If you think this still needs to be addressed please feel free to reopen this issue. Thanks! |
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f683231b670>
Traceback (most recent call last):
File "/home/u20/miniconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-04-03 12:50:15,113] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 21626
[2023-04-03 12:50:15,113] [ERROR] [launch.py:324:sigkill_handler] ['/home/u20/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/u20/LMFlow/data/alpaca/train', '--output_dir', '/home/u20/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1
error when run ./scripts/run_finetune.sh
i have gpu and cuda installed,
why it raises cpu error?
./scripts/run_finetune_with_lora.sh also raise same error
The text was updated successfully, but these errors were encountered: