You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I was trying to run instruction tuning of CodeT5+ and encountered this issue. The error message is
(cjy_ct5) nlpir@nlpir-SYS-4028GR-TR:~/cjy/CodeT5/CodeT5+$ sh instruct_finetune.sh
Using CUDA version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
[2024-05-28 20:40:23,112] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[2024-05-28 20:40:25,339] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-05-28 20:40:25,339] [INFO] [runner.py:568:main] cmd = /home/nlpir/miniconda3/envs/cjy_ct5/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None instruct_tune_codet5p.py --load baselines/codet5p-220m --save-dir saved_models/instructcodet5p-220m --instruct-data-path datasets/code_alpaca_20k.json --fp16 --deepspeed deepspeed_config.json
[2024-05-28 20:40:26,653] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[2024-05-28 20:40:28,850] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [6, 7]}
[2024-05-28 20:40:28,850] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-05-28 20:40:28,850] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-05-28 20:40:28,850] [INFO] [launch.py:164:main] dist_world_size=2
[2024-05-28 20:40:28,850] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=6,7
[2024-05-28 20:40:28,860] [INFO] [launch.py:256:main] process 2307758 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=0', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json']
[2024-05-28 20:40:28,867] [INFO] [launch.py:256:main] process 2307759 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json']
{'batch_size_per_replica': 1,
'cache_data': 'cache_data/instructions',
'data_num': -1,
'deepspeed': 'deepspeed_config.json',
'epochs': 3,
'fp16': True,
'grad_acc_steps': 16,
'instruct_data_path': 'datasets/code_alpaca_20k.json',
'load': 'baselines/codet5p-220m',
'local_rank': 1,
'log_freq': 10,
'lr': 2e-05,
'lr_warmup_steps': 30,
'max_len': 512,
'save_dir': 'saved_models/instructcodet5p-220m',
'save_freq': 500}
==> Loaded 20022 samples
{'batch_size_per_replica': 1,
'cache_data': 'cache_data/instructions',
'data_num': -1,
'deepspeed': 'deepspeed_config.json',
'epochs': 3,
'fp16': True,
'grad_acc_steps': 16,
'instruct_data_path': 'datasets/code_alpaca_20k.json',
'load': 'baselines/codet5p-220m',
'local_rank': 0,
'log_freq': 10,
'lr': 2e-05,
'lr_warmup_steps': 30,
'max_len': 512,
'save_dir': 'saved_models/instructcodet5p-220m',
'save_freq': 500}
==> Loaded 20022 samples
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
[2024-05-28 20:40:32,872] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307758
[2024-05-28 20:40:32,873] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307759
[2024-05-28 20:40:32,905] [ERROR] [launch.py:325:sigkill_handler] ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json'] exits with return code = 1
Hello, I was trying to run instruction tuning of CodeT5+ and encountered this issue. The error message is
(cjy_ct5) nlpir@nlpir-SYS-4028GR-TR:~/cjy/CodeT5/CodeT5+$ sh instruct_finetune.sh
Using CUDA version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
[2024-05-28 20:40:23,112] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[2024-05-28 20:40:25,339] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-05-28 20:40:25,339] [INFO] [runner.py:568:main] cmd = /home/nlpir/miniconda3/envs/cjy_ct5/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None instruct_tune_codet5p.py --load baselines/codet5p-220m --save-dir saved_models/instructcodet5p-220m --instruct-data-path datasets/code_alpaca_20k.json --fp16 --deepspeed deepspeed_config.json
[2024-05-28 20:40:26,653] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[2024-05-28 20:40:28,850] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [6, 7]}
[2024-05-28 20:40:28,850] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-05-28 20:40:28,850] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-05-28 20:40:28,850] [INFO] [launch.py:164:main] dist_world_size=2
[2024-05-28 20:40:28,850] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=6,7
[2024-05-28 20:40:28,860] [INFO] [launch.py:256:main] process 2307758 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=0', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json']
[2024-05-28 20:40:28,867] [INFO] [launch.py:256:main] process 2307759 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json']
{'batch_size_per_replica': 1,
'cache_data': 'cache_data/instructions',
'data_num': -1,
'deepspeed': 'deepspeed_config.json',
'epochs': 3,
'fp16': True,
'grad_acc_steps': 16,
'instruct_data_path': 'datasets/code_alpaca_20k.json',
'load': 'baselines/codet5p-220m',
'local_rank': 1,
'log_freq': 10,
'lr': 2e-05,
'lr_warmup_steps': 30,
'max_len': 512,
'save_dir': 'saved_models/instructcodet5p-220m',
'save_freq': 500}
==> Loaded 20022 samples
{'batch_size_per_replica': 1,
'cache_data': 'cache_data/instructions',
'data_num': -1,
'deepspeed': 'deepspeed_config.json',
'epochs': 3,
'fp16': True,
'grad_acc_steps': 16,
'instruct_data_path': 'datasets/code_alpaca_20k.json',
'load': 'baselines/codet5p-220m',
'local_rank': 0,
'log_freq': 10,
'lr': 2e-05,
'lr_warmup_steps': 30,
'max_len': 512,
'save_dir': 'saved_models/instructcodet5p-220m',
'save_freq': 500}
==> Loaded 20022 samples
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
[2024-05-28 20:40:32,872] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307758
[2024-05-28 20:40:32,873] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307759
[2024-05-28 20:40:32,905] [ERROR] [launch.py:325:sigkill_handler] ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json'] exits with return code = 1
The content of my "instruct_finetune.sh" file is
#!/bin/bash
export PATH=/usr/local/cuda-11.7/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH
echo "Using CUDA version:"
nvcc --version
MODEL_PATH="baselines/codet5p-220m"
SAVE_DIR="saved_models/instructcodet5p-220m"
DATA_PATH="datasets/code_alpaca_20k.json"
deepspeed --include localhost:6,7 instruct_tune_codet5p.py
--load $MODEL_PATH --save-dir $SAVE_DIR --instruct-data-path $DATA_PATH
--fp16 --deepspeed deepspeed_config.json
Could you please tell me what's the problem and how to solve it? Thank you!
The text was updated successfully, but these errors were encountered: