Closed
Description
System Info
New versions can't train encoder-decoder models.
Related issue and pull request: #34575
System-Info:
transformers
version: 4.48.1- Platform: Linux-6.8.0-36-generic-x86_64-with-glibc2.39
- Python version: 3.12.8
- Huggingface_hub version: 0.24.6
- Safetensors version: 0.4.5
- Accelerate version: 1.2.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: Tesla V100-PCIE-32GB
Traceback (most recent call last):
File "/home/hilsenbek/workspace/thesis/syntax_transformer/training/train_cross_attention.py", line 110, in <module>
trainer.train()
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 3731, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 603, in forward
encoder_outputs = self.encoder(
^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RobertaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
follow the blog https://huggingface.co/blog/encoder-decoder
Expected behavior
Work as in old transformer versions