Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: llama模型loss=0时出现"Tensor need be reduced must not empty [Hint: Expected x.numel() > 0, but received x.numel():0 <= 0:0.]"错误 #8299

Closed
1 task done
dynamicheart opened this issue Apr 22, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@dynamicheart
Copy link
Contributor

软件环境

- paddlepaddle-gpu: 
commit: 4ffb7da786cef844deb3cf8ad7f95d56000bd010
cuda: 12.0
cudnn: 8.9.1
- paddlenlp: 
commit: 74bb39b51bef45f32aee310efdb8994042c00bb3

重复问题

  • I have searched the existing issues

错误描述

[2024-03-05 08:06:28,678] [    INFO] - loss: 4.23760509, learning_rate: 2.999e-05, global_step: 2310, interval_runtime: 1.1534, interval_samples_per_second: 6.935981184579392, interval_steps_per_second: 0.866997648072424, epoch: 0.0229
[2024-03-05 08:06:29,834] [    INFO] - loss: 4.39690018, learning_rate: 2.999e-05, global_step: 2311, interval_runtime: 1.1555, interval_samples_per_second: 6.923501595186914, interval_steps_per_second: 0.8654376993983642, epoch: 0.0229
LAUNCH INFO 2024-03-05 08:06:34,816 Pod failed
LAUNCH ERROR 2024-03-05 08:06:34,817 Container failed !!!
Container rank 6 status failed cmd ['/usr/bin/python', '-u', 'run_pretrain.py', '--model_type', 'llama', '--model_name_or_path', 'facebook/llama-13b', '--tokenizer_name_or_path', 'facebook/llama-13b', '--input_dir', './data', '--output_dir', 'output/llama_hybrid', '--split', '949,50,1', '--max_seq_length', '2048', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--use_flash_attention', '1', '--use_fused_rope', '1', '--fuse_attention_ffn', '1', '--fuse_attention_qkv', '1', '--use_fused_rms_norm', '1', '--num_hidden_layers', '40', '--bf16', '--fp16_opt_level', 'O2', '--scale_loss', '1024', '--learning_rate', '0.00003', '--min_learning_rate', '0.000005', '--lr_scheduler_type', 'cosine', '--max_steps', '100000', '--save_steps', '100000', '--weight_decay', '0.01', '--warmup_ratio', '0.01', '--max_grad_norm', '1.0', '--logging_steps', '1', '--dataloader_num_workers', '1', '--sharding', 'stage2', '--eval_steps', '1000', '--report_to', 'visualdl', '--disable_tqdm', 'true', '--continue_training', '0', '--recompute', '0', '--do_train', '--device', 'gpu'] code 1 log output/llama_hybrid_log/workerlog.6 
env {'NV_LIBCUBLAS_VERSION': '12.0.1.189-1', 'NVIDIA_VISIBLE_DEVICES': 'all', 'COLORTERM': 'truecolor', 'NV_NVML_DEV_VERSION': '12.0.76-1', 'NV_CUDNN_PACKAGE_NAME': 'libcudnn8', 'GREP_COLOR': '1;31', 'TERM_PROGRAM_VERSION': '1.83.1', 'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.17.1-1+cuda12.0', 'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.17.1-1', 'HOSTNAME': 'szzj-isa-ai-peking-poc13.szzj.baidu.com', 'LANGUAGE': 'en_US.UTF-8', 'NVIDIA_REQUIRE_CUDA': 'cuda>=12.0 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471', 'NV_LIBCUBLAS_DEV_PACKAGE': 'libcublas-dev-12-0=12.0.1.189-1', 'NV_NVTX_VERSION': '12.0.76-1', 'NV_CUDA_CUDART_DEV_VERSION': '12.0.107-1', 'NV_LIBCUSPARSE_VERSION': '12.0.0.76-1', 'NV_LIBNPP_VERSION': '12.0.0.30-1', 'NCCL_VERSION': '2.17.1-1', 'PWD': '/host/PaddleNLP-XPU/llm/llama', 'NV_CUDNN_PACKAGE': 'libcudnn8=8.8.0.121-1+cuda12.0', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'WITH_AVX': 'ON', 'NV_NVPROF_DEV_PACKAGE': 'cuda-nvprof-12-0=12.0.90-1', 'NV_LIBNPP_PACKAGE': 'libnpp-12-0=12.0.0.30-1', 'NV_LIBNCCL_DEV_PACKAGE_NAME': 'libnccl-dev', 'GREP_OPTIONS': '--color=auto', 'VSCODE_GIT_ASKPASS_NODE': '/root/.vscode-server/bin/1.8.401.83.1.02/node', 'NV_LIBCUBLAS_DEV_VERSION': '12.0.1.189-1', 'NVIDIA_PRODUCT_NAME': 'CUDA', 'NV_LIBCUBLAS_DEV_PACKAGE_NAME': 'libcublas-dev-12-0', 'NV_CUDA_CUDART_VERSION': '12.0.107-1', 'HOME': '/root', 'LANG': 'en_US.UTF-8', 'NVIDIA_CUDA_END_OF_LIFE': '1', 'CUDA_VERSION': '12.0.0', 'NV_LIBCUBLAS_PACKAGE': 'libcublas-12-0=12.0.1.189-1', 'NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE': 'cuda-nsight-compute-12-0=12.0.0-1', 'ICODING_VERSION': '1.8.401.83.1.02', 'GIT_ASKPASS': '/root/.vscode-server/bin/1.8.401.83.1.02/extensions/git/dist/askpass.sh', 'CLICOLOR': '1', 'NV_LIBNPP_DEV_PACKAGE': 'libnpp-dev-12-0=12.0.0.30-1', 'GOROOT': '/usr/local/go', 'NV_LIBCUBLAS_PACKAGE_NAME': 'libcublas-12-0', 'NV_LIBNPP_DEV_VERSION': '12.0.0.30-1', 'VSCODE_GIT_ASKPASS_EXTRA_ARGS': '', 'WITH_GPU': 'ON', 'TERM': 'xterm-256color', 'NV_LIBCUSPARSE_DEV_VERSION': '12.0.0.76-1', 'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs', 'NV_CUDNN_VERSION': '8.8.0.121', 'VSCODE_GIT_IPC_HANDLE': '/tmp/vscode-git-a504850b12.sock', 'SHLVL': '2', 'NV_CUDA_LIB_VERSION': '12.0.0-1', 'NVARCH': 'x86_64', 'CUDNN_VERSION': '8.9.1', 'NV_CUDNN_PACKAGE_DEV': 'libcudnn8-dev=8.8.0.121-1+cuda12.0', 'NV_CUDA_COMPAT_PACKAGE': 'cuda-compat-12-0', 'NV_LIBNCCL_PACKAGE': 'libnccl2=2.17.1-1+cuda12.0', 'LD_LIBRARY_PATH': '', 'NV_CUDA_NSIGHT_COMPUTE_VERSION': '12.0.0-1', 'NV_NVPROF_VERSION': '12.0.90-1', 'LC_ALL': 'en_US.UTF-8', 'VSCODE_GIT_ASKPASS_MAIN': '/root/.vscode-server/bin/1.8.401.83.1.02/extensions/git/dist/askpass-main.js', 'BROWSER': '/root/.vscode-server/bin/1.8.401.83.1.02/bin/helpers/browser.sh', 'PATH': '/root/.BCloud/bin:/root/.vscode-server/bin/1.8.401.83.1.02/bin/remote-cli:/root/.BCloud/bin:/root/.vscode-server/bin/1.8.401.83.1.02/bin:/root/.vscode-server/bin:/home/cmake-3.18.0-Linux-x86_64/bin:/usr/local/gcc-12.1/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/go/bin:/root/gopath/bin', 'NV_LIBNCCL_PACKAGE_NAME': 'libnccl2', 'NV_LIBNCCL_PACKAGE_VERSION': '2.17.1-1', 'DEBIAN_FRONTEND': 'noninteractive', 'OLDPWD': '/host/PaddleNLP-XPU', 'GOPATH': '/root/gopath', 'TERM_PROGRAM': 'vscode', 'VSCODE_IPC_HOOK_CLI': '/tmp/vscode-ipc-1f8e8da3-5315-4fd5-b7be-285e4dc98f23.sock', '_': '/usr/bin/python', 'CUSTOM_DEVICE_ROOT': '', 'OMP_NUM_THREADS': '1', 'POD_NAME': 'egfwmz', 'PADDLE_MASTER': '10.93.234.25:45151', 'PADDLE_GLOBAL_SIZE': '8', 'PADDLE_LOCAL_SIZE': '8', 'PADDLE_GLOBAL_RANK': '6', 'PADDLE_LOCAL_RANK': '6', 'PADDLE_NNODES': '1', 'PADDLE_CURRENT_ENDPOINT': '10.93.234.25:45158', 'PADDLE_TRAINER_ID': '6', 'PADDLE_TRAINERS_NUM': '8', 'PADDLE_RANK_IN_NODE': '6', 'PADDLE_TRAINER_ENDPOINTS': '10.93.234.25:45152,10.93.234.25:45153,10.93.234.25:45154,10.93.234.25:45155,10.93.234.25:45156,10.93.234.25:45157,10.93.234.25:45158,10.93.234.25:45159', 'FLAGS_selected_gpus': '6', 'PADDLE_LOG_DIR': '/host/PaddleNLP-XPU/llm/llama/output/llama_hybrid_log'}
LAUNCH INFO 2024-03-05 08:06:34,817 ------------------------- ERROR LOG DETAIL -------------------------
[32m[2024-03-05 07:21:54,674] [    INFO] - ***** Running training *****
[2024-03-05 07:21:54,674] [    INFO] -   Num examples = 806,405
[2024-03-05 07:21:54,674] [    INFO] -   Num Epochs = 1
[2024-03-05 07:21:54,674] [    INFO] -   Instantaneous batch size per device = 1
[2024-03-05 07:21:54,674] [    INFO] -   Total train batch size (w. parallel, distributed & accumulation) = 8
[2024-03-05 07:21:54,674] [    INFO] -   Gradient Accumulation steps = 1
[2024-03-05 07:21:54,674] [    INFO] -   Total optimization steps = 100,000
[2024-03-05 07:21:54,674] [    INFO] -   Total num train samples = 800,000
[2024-03-05 07:21:54,676] [    INFO] -   Number of trainable parameters = 13,015,864,320 (per device)
I0305 07:21:56.126010 76258 custom_operator.cc:1296] register pir custom op :fused_rms_norm
I0305 07:21:56.126060 76258 custom_operator.cc:1296] register pir custom op :fused_rms_norm_grad
I0305 07:21:56.126178 76258 custom_operator.cc:1296] register pir custom op :fused_ln
I0305 07:21:56.126186 76258 custom_operator.cc:1296] register pir custom op :fused_ln_grad
Traceback (most recent call last):
  File "/host/PaddleNLP-XPU/llm/llama/run_pretrain.py", line 567, in <module>
    main()
  File "/host/PaddleNLP-XPU/llm/llama/run_pretrain.py", line 548, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/host/PaddleNLP-XPU/paddlenlp/trainer/trainer.py", line 890, in train
    dp_master_grad = (
  File "/host/PaddleNLP-XPU/paddlenlp/trainer/trainer.py", line 1900, in training_step
  File "/host/PaddleNLP-XPU/paddlenlp/trainer/trainer.py", line 1853, in compute_loss
    labels = (inputs.pop("start_positions"), inputs.pop("end_positions"))
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/sharding/group_sharded_stage2.py", line 190, in forward
    fw = self._layer(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/host/PaddleNLP-XPU/paddlenlp/transformers/llama/modeling.py", line 1611, in forward
    loss = self.criterion(logits, labels)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/host/PaddleNLP-XPU/paddlenlp/transformers/llama/modeling.py", line 1427, in forward
    loss = paddle.mean(masked_lm_loss)
  File "/usr/local/lib/python3.10/dist-packages/paddle/tensor/stat.py", line 90, in mean
    return _C_ops.mean(x, axis, keepdim)
ValueError: (InvalidArgument) Tensor need be reduced must not empty.
  [Hint: Expected x.numel() > 0, but received x.numel():0 <= 0:0.] (at ../paddle/phi/kernels/funcs/reduce_function.h:1052)

LAUNCH INFO 2024-03-05 08:06:40,653 Exit code -15

稳定复现步骤 & 代码

错误来自于这两行

由于masked_lm_loss.numel() == 0,对其进行paddle.mean操作会报如上错误,loss为0的原因应该是softmax操作产生了onehot tensor, 只有target label对应位置的值为1,其它位置为0。

import numpy as np

def stable_softmax(x):
    z = x - np.max(x, axis=-1, keepdims=True)
    print("z", z)
    numerator = np.exp(z)
    print("numerator", numerator)
    denominator = np.sum(numerator, axis=-1, keepdims=True)
    print("denominator", denominator)
    softmax = numerator / denominator
    print("softmax", softmax)
    return softmax

x = [-2710.10620117, -2914.37866211, -5045.04443359, -4361.91601562, -459.57000732, 8843.65820312, -1871.62756348, 5447.12451172, -10947.22949219]
stable_softmax(x)

# z [-11553.76440429 -11758.03686523 -13888.70263671 -13205.57421874 -9303.22821044 0  -10715.2857666 -3396.5336914  -19790.88769531]
# numerator [0. 0. 0. 0. 0. 1. 0. 0. 0.]
# denominator [1.]
# softmax [0. 0. 0. 0. 0. 1. 0. 0. 0.]
# array([0., 0., 0., 0., 0., 1., 0., 0., 0.])

当exp的指数较小时(小于-1000),结果会等于0

参考资料:

@dynamicheart dynamicheart added the bug Something isn't working label Apr 22, 2024
@w5688414
Copy link
Contributor

w5688414 commented Apr 22, 2024

感谢您的反馈,我查了一下是这个pr引入的问题:

93e78c2#diff-99e104eff4c095428aa1cd5d186107ae22737297e8ec3b5c12cd138e69a79cb5

看看下面的实现能否解决您的问题:

masked_lm_loss = masked_lm_loss[masked_lm_labels != self.ignore_index]

@dynamicheart
Copy link
Contributor Author

@w5688414 好的,看上去这样,如果数据集处理得没问题,应该能保证masked_lm_loss不为空tensor。我后续试一下,但这个不是稳定复现的。

@cqulilujia
Copy link
Contributor

cqulilujia commented May 17, 2024

在使用pipeparallel=2、shardingstage1配置跑llama模型pretrain时,又踩到了这个坑,定位到是现在的loss函数返回了loss=float(0),导致触发了paddle/distributed/fleet/meta_parallel/pipeline_parallel.py中的assert,log如下:

在使用 #8459 中的修复方法之后,绕过了pp中的类型检查,但是程序会在step=81这步卡住,不能再正常向下运行。推测是否是新建tensor导致梯度断掉,而导致pp配置下的某些通讯逻辑不能正常执行

[32m[2024-05-15 16:26:28,733] [ INFO]�[0m - loss: 7.44834805, learning_rate: 2.4e-06, global_step: 79, current_memory_allocated: 42.891517996788025, current_memory_reserved: 0.0, max_memory_allocated: 82.25603437423706, max_memory_reserved: 0.0, interval_runtime: 29.755, interval_samples_per_second: 4.3018, interval_tokens_per_second_per_device: 2202.5182, interval_steps_per_second: 0.0336, progress_or_epoch: 0.0008�[0m
�[32m[2024-05-15 16:26:58,668] [ INFO]�[0m - loss: 7.31905365, learning_rate: 2.43e-06, global_step: 80, current_memory_allocated: 42.891517996788025, current_memory_reserved: 0.0, max_memory_allocated: 82.25603437423706, max_memory_reserved: 0.0, interval_runtime: 29.935, interval_samples_per_second: 4.2759, interval_tokens_per_second_per_device: 2189.279, interval_steps_per_second: 0.0334, progress_or_epoch: 0.0008�[0m
LAUNCH INFO 2024-05-15 16:27:24,714 Pod failed
LAUNCH ERROR 2024-05-15 16:27:24,715 Container failed !!!
Container rank 4 status failed cmd ['/root/miniconda3/envs/paddle/bin/python', '-u', 'run_pretrain.py', '--model_name_or_path', 'meta-llama/Llama-2-13b', '--tokenizer_name_or_path', 'meta-llama/Llama-2-13b', '--input_dir', './data', '--output_dir', 'output/llama2-13b-4k/20240515154555', '--split', '949,50,1', '--max_seq_length', '4096', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--use_flash_attention', '1', '--use_fused_rope', '1', '--fuse_attention_ffn', '1', '--fuse_attention_qkv', '1', '--use_fused_rms_norm', '1', '--num_hidden_layers', '40', '--bf16', '--fp16_opt_level', 'O2', '--scale_loss', '1024', '--learning_rate', '0.00003', '--min_learning_rate', '0.000005', '--lr_scheduler_type', 'cosine', '--max_steps', '100000', '--save_steps', '100000', '--weight_decay', '0.01', '--warmup_ratio', '0.01', '--max_grad_norm', '1.0', '--logging_steps', '1', '--sequence_parallel', '0', '--dataloader_num_workers', '4', '--pipeline_parallel_degree', '2', '--pipeline_parallel_config', 'disable_partial_send_recv', '--tensor_parallel_degree', '1', '--tensor_parallel_config', 'enable_mp_async_allreduce,enable_mp_skip_c_identity', '--gradient_accumulation_steps', '32', '--sharding', 'stage1', '--eval_steps', '1000', '--report_to', 'visualdl', '--disable_tqdm', 'true', '--continue_training', '0', '--recompute', '0', '--do_train', '--seed', '1026', '--device', 'xpu'] code 1 log output/llama2-13b-4k/20240515154555_log/workerlog.4
env {'PYTHONPATH': '../../:', 'LSCOLORS': 'Gxfxcxdxbxegedabagacad', 'LESS': '-R', 'CONDA_EXE': '/root/miniconda3/bin/conda', '_CE_M': '', 'XPU_CDNN_CLUSTER_PARALLEL_STREAM_NUMBER': '2', 'HOSTNAME': 'localhost.localdomain', 'PWD': '/workspace/PaddleNLP/llm/llama', 'LOGNAME': 'root', 'CONDA_PREFIX': '/root/miniconda3/envs/paddle', 'XPU_PADDLE_L3_SIZE1': '1024', 'XPU_PADDLE_L3_SIZE0': '1024', 'XBLAS_FC_HBM_VERSION': '40', 'FLAGS_use_stride_kernel': '0', 'HOME': '/root', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:*.xspf=00;36:', 'CONDA_PROMPT_MODIFIER': '(paddle) ', 'TERM': 'xterm', 'XPU_CDNN_CLUSTER_PARALLEL': '1', 'ZSH': '/root/.oh-my-zsh', 'CE_CONDA': '', 'XPUAPI_DEFAULT_SIZE0': '1502653248', 'XPUAPI_DEFAULT_SIZE1': '380265324', 'CONDA_SHLVL': '2', 'SHLVL': '2', 'PAGER': 'less', 'CUDA_DEVICE_MAX_CONNECTIONS': '8', 'CONDA_PYTHON_EXE': '/root/miniconda3/bin/python', 'LD_LIBRARY_PATH': '/workspace/so-bkcl/:/workspace/so-runtime/:/workspace/so-fast_paddle/:', 'CONDA_DEFAULT_ENV': 'paddle', 'XPU_FORCE_USERMODE_LAUNCH': '1', 'PATH': '/root/miniconda3/envs/paddle/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'CONDA_PREFIX_1': '/root/miniconda3', 'OLDPWD': '/workspace/PaddleNLP', '': '/root/miniconda3/envs/paddle/bin/python', 'LC_CTYPE': 'C.UTF-8', 'CUSTOM_DEVICE_ROOT': '', 'OMP_NUM_THREADS': '1', 'POD_NAME': 'cztpec', 'PADDLE_MASTER': '127.0.0.1:36569', 'PADDLE_GLOBAL_SIZE': '8', 'PADDLE_LOCAL_SIZE': '8', 'PADDLE_GLOBAL_RANK': '4', 'PADDLE_LOCAL_RANK': '4', 'PADDLE_NNODES': '1', 'PADDLE_CURRENT_ENDPOINT': '127.0.0.1:36574', 'PADDLE_TRAINER_ID': '4', 'PADDLE_TRAINERS_NUM': '8', 'PADDLE_RANK_IN_NODE': '4', 'PADDLE_TRAINER_ENDPOINTS': '127.0.0.1:36570,127.0.0.1:36571,127.0.0.1:36572,127.0.0.1:36573,127.0.0.1:36574,127.0.0.1:36575,127.0.0.1:36576,127.0.0.1:36577', 'FLAGS_selected_xpus': '4', 'PADDLE_LOG_DIR': '/workspace/PaddleNLP/llm/llama/output/llama2-13b-4k/20240515154555_log'}
LAUNCH INFO 2024-05-15 16:27:24,715 ------------------------- ERROR LOG DETAIL -------------------------
dygraph_optimizer/dygraph_sharding_optimizer.py:101: UserWarning: nccl reduce_avg requires paddle compiled with cuda and nccl>=2.10.0, please check compilation setups.
warnings.warn(
[2024-05-15 15:46:56,542] [ WARNING] hybrid_parallel_optimizer.py:292 - While using ClipGradByGlobalNorm in TensorParallel, PipelineParallel or Sharding, the grad clip of original optimizer will be changed.
�[32m[2024-05-15 15:46:56,542] [ INFO]�[0m - [timelog] checkpoint loading time: 0.00s (2024-05-15 15:46:56) �[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - ***** Running training *****�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Num examples = 12,816,085�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Num Epochs = 1�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Instantaneous batch size per device = 1�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Total train batch size (w. parallel, distributed & accumulation) = 128�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Gradient Accumulation steps = 32�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Total optimization steps = 100,000�[0m
�[32m[2024-05-15 15:46:56,543] [ INFO]�[0m - Total num train samples = 12,800,000�[0m
�[35m[2024-05-15 15:46:56,545] [ DEBUG]�[0m - Number of trainable parameters = 6,507,934,720 (per device)�[0m
�[35m[2024-05-15 15:46:56,563] [ DEBUG]�[0m - Number of trainable parameters = 13,015,863,296 (all devices, roughly)�[0m
/root/miniconda3/envs/paddle/lib/python3.9/site-packages/paddle/amp/auto_cast.py:502: UserWarning: XPUPlace only support float16 amp.
warnings.warn('XPUPlace only support float16 amp.')
Traceback (most recent call last):
File "/workspace/PaddleNLP/llm/llama/run_pretrain.py", line 630, in
main()
File "/workspace/PaddleNLP/llm/llama/run_pretrain.py", line 608, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 770, in train
return self._inner_training_loop(
File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 964, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 2044, in training_step
return self.training_pipeline_step(model, inputs)
File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 2113, in training_pipeline_step
loss = model.forward_backward_pipeline(inputs, self.scaler if self.do_grad_scaling else None)
File "/root/miniconda3/envs/paddle/lib/python3.9/site-packages/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py", line 536, in forward_backward_pipeline
output_tensor = self._forward_step(input_tensor, micro_dataset)
File "/root/miniconda3/envs/paddle/lib/python3.9/site-packages/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py", line 789, in _forward_step
assert isinstance(
AssertionError: Currently, loss_fn should obtain Paddle.Tensor dtype
LAUNCH INFO 2024-05-15 16:27:29,316 Exit code -15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants