Skip to content

Out of memory error while training GPT2-large on 8x32GB Nvidia Volta #3616

@timsoraro

Description

@timsoraro

🐛 Bug

I'm getting an out-of-memory error while trianing gpt2-large using batch_size=1. I'm using the examples/run_language_modeling.py script. I'm using a custom dataset with varied length examples, maximum block_size is 1024.

This is the command I'm using:

python -m torch.distributed.launch --nproc_per_node 8 run_language_modeling.py --output_dir=./output_attention_mask_padding/ --model_type=gpt2 --model_name_or_path=gpt2-large --do_train --train_data_file=./data/training.txt --line_by_line --per_gpu_train_batch_size 1 --num_train_epochs 3 --fp16

I tried changing args.gradient_accumulation_steps but to no success.

Here's the traceback:

                                                                                                                 Traceback (most recent call last):                                                 | 9/213 [00:45<09:51,  2.90s/it]
  File "run_language_modeling.py", line 988, in <module>
    main()
  File "run_language_modeling.py", line 938, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_language_modeling.py", line 506, in train
    outputs = model(inputs, masked_lm_labels=labels, attention_mask=attention_mask) if args.mlm else model(inputs, labels=labels, attention_mask=attention_mask)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 442, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/deepspeed/.local/lib/python3.6/site-packages/transformers/modeling_gpt2.py", line 612, in forward
    loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/wrap.py", line 27, in wrapper
    kwargs)
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/utils.py", line 78, in casted_args
    new_args.append(cast_fn(x))
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/utils.py", line 71, in maybe_float
    return x.float()
RuntimeError: CUDA out of memory. Tried to allocate 190.00 MiB (GPU 2; 31.72 GiB total capacity; 28.71 GiB already allocated; 135.88 MiB free; 1.66 GiB cached)
Traceback (most recent call last):
  File "run_language_modeling.py", line 988, in <module>
    main()
  File "run_language_modeling.py", line 938, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_language_modeling.py", line 523, in train
    scaled_loss.backward()
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 194.00 MiB (GPU 4; 31.72 GiB total capacity; 29.42 GiB already allocated; 155.88 MiB free; 951.73 MiB cached)

Environment info

  • transformers version: 2.6.0
  • Platform: Linux
  • Using distributed or parallel set-up in script?: Yes

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions