静态图 GradientMergeOptimizer 与 main_program.clone(for_test=True) 冲突 #43571
Open
Description
bug描述 Describe the Bug
[2022-06-16 11:27:32,284] [ INFO] - The training meta optimizer is/are ['GradientMergeOptimizer', 'AMPOptimizer']
W0616 11:27:33.298504 6074 gpu_context.cc:278] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Vers
ion: 10.2
W0616 11:27:33.305351 6074 gpu_context.cc:306] device: 1, cuDNN Version: 7.6.
Traceback (most recent call last):
File "run_pretrain_static.py", line 677, in <module>
do_train(config)
File "run_pretrain_static.py", line 489, in do_train
test_program = main_program.clone(for_test=True)
File "/ssd2/zhonghui03/anaconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/framework.py", line 5419, in clone
self.desc)
RuntimeError: (NotFound) The origin sub block id is not found in pruned_progin_block_id_map
[Hint: Expected sub_idx != -1, but received sub_idx:-1 == -1:-1.] (at /paddle/paddle/fluid/framework/prune.cc:511)
INFO 2022-06-16 11:27:48,019 launch_utils.py:343] terminate all the procs
INFO 2022-06-16 11:27:48,019 launch_utils.py:343] terminate all the procs
ERROR 2022-06-16 11:27:48,019 launch_utils.py:642] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check
its log.
ERROR 2022-06-16 11:27:48,019 launch_utils.py:642] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check
its log.
INFO 2022-06-16 11:27:52,023 launch_utils.py:343] terminate all the procs
INFO 2022-06-16 11:27:52,023 launch_utils.py:343] terminate all the procs
INFO 2022-06-16 11:27:52,023 launch.py:402] Local processes completed.
INFO 2022-06-16 11:27:52,023 launch.py:402] Local processes completed.
其他补充信息 Additional Supplementary Information
python -u -m paddle.distributed.launch \
--gpus "1" \
--log_dir "output/$task_name/log" \
run_pretrain_static.py \
--model_type "ernie" \
--model_name_or_path "ernie-1.0-base-zh" \
--input_dir "./data" \
--split 8,1,1 \
--output_dir "output/$task_name" \
--max_seq_len 128 \
--micro_batch_size 32 \
--global_batch_size 64 \
--sharding_degree 1\
--dp_degree 1 \
--use_sharding false \
--use_amp true \
--use_recompute false \
--max_lr 0.0001 \
--min_lr 0.00001 \
--max_steps 2000 \
--save_steps 100000 \
--checkpoint_steps 5000 \
--decay_steps 1980 \
--weight_decay 0.01\
--warmup_rate 0.01 \
--grad_clip 1.0 \
--num_workers 2 \
--logging_freq 20\
--eval_freq 1000 \
--device "gpu"
global_batch_size 64 = micro_batch_size * 2, 代码里面自动使用了梯度累积。报错