Skip to content

Llama 3 - RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920 #32170

@jacob-morrison

Description

@jacob-morrison

System Info

transformers version 4.43.1, other package versions here: https://github.com/allenai/open-instruct/blob/main/requirements.txt

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Running: unset CUDA_LAUNCH_BLOCKING && accelerate launch --mixed_precision bf16 --num_machines 2 --num_processes 16 --machine_rank $BEAKER_REPLICA_RANK --main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME --main_process_port 29400 --use_deepspeed --deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf --deepspeed_multinode_launcher standard open_instruct/finetune.py --model_name_or_path meta-llama/Meta-Llama-3.1-8B --tokenizer_name meta-llama/Meta-Llama-3.1-8B --use_slow_tokenizer --dataset_name allenai/tulu-v2-sft-mixture --use_flash_attn --max_seq_length 4096 --preprocessing_num_workers 16 --per_device_train_batch_size 1 --gradient_accumulation_steps 8 --learning_rate 5e-6 --lr_scheduler_type linear --warmup_ratio 0.03 --weight_decay 0. --num_train_epochs 2 --output_dir /output/ --with_tracking --report_to tensorboard --logging_steps 1 --reduce_loss sum using open-instruct

we encounter this error on the first step of finetuning:

2024-07-23T21:19:48.544516135Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-07-23T21:19:48.544518524Z batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-07-23T21:19:49.155378393Z [rank2]: Traceback (most recent call last):
2024-07-23T21:19:49.155406373Z [rank2]: File "/stage/open_instruct/finetune.py", line 683, in
2024-07-23T21:19:49.155409168Z [rank2]: main()
2024-07-23T21:19:49.155410556Z [rank2]: File "/stage/open_instruct/finetune.py", line 602, in main
2024-07-23T21:19:49.155412476Z [rank2]: outputs = model(**batch, use_cache=False)
2024-07-23T21:19:49.155413980Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-07-23T21:19:49.155415839Z [rank2]: return self._call_impl(*args, **kwargs)
2024-07-23T21:19:49.155417058Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-07-23T21:19:49.155418501Z [rank2]: return forward_call(*args, **kwargs)
2024-07-23T21:19:49.155419655Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
2024-07-23T21:19:49.155421076Z [rank2]: ret_val = func(*args, **kwargs)
2024-07-23T21:19:49.155422228Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
2024-07-23T21:19:49.155423640Z [rank2]: loss = self.module(*inputs, **kwargs)
2024-07-23T21:19:49.155424827Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-07-23T21:19:49.155440561Z [rank2]: return self._call_impl(*args, **kwargs)
2024-07-23T21:19:49.155441869Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
2024-07-23T21:19:49.155443280Z [rank2]: result = forward_call(*args, **kwargs)
2024-07-23T21:19:49.155444498Z [rank2]: File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1168, in forward
2024-07-23T21:19:49.155446074Z [rank2]: shift_logits = shift_logits.view(-1, self.config.vocab_size)
2024-07-23T21:19:49.155447329Z [rank2]: RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920

after updating to transformers 4.43.1 to support Llama 3.1 finetuning. Any idea what's going on? We're not sure if other packages need to be updated, if this is a known issue, or something else.

Expected behavior

Llama 3.1 finetuning to run successfully

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions