Closed
Description
Hi, I'm having some trouble training a Light/DynamicConv language model. At first, a parameter was missing and I would get an exception while trying to build the model, but applying the patch from #536 fixed that.
Now, the model gets built and the layers and parameter count gets printed out to the screen, but immediately in the first training iteration this happens:
| model lightconv_lm, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 19034112 (num. trained: 19034112)
| training on 1 GPUs
| max tokens per GPU = 4000 and max sentences per GPU = None
| no existing checkpoint found ./model/03_lightconv/checkpoints/checkpoint_last.pt
| loading train data for epoch 0
| loaded 1716 examples from: ./model/03_lightconv/preprocessed/train
| epoch 001: 0%| | 0/6384 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/lavid/anaconda3/envs/samuel/bin/fairseq-train", line 11, in <module>
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/media/lavid/Data/Samuel/fairseq/fairseq_cli/train.py", line 333, in cli_main
main(args)
File "/media/lavid/Data/Samuel/fairseq/fairseq_cli/train.py", line 86, in main
train(args, trainer, task, epoch_itr)
File "/media/lavid/Data/Samuel/fairseq/fairseq_cli/train.py", line 127, in train
log_output = trainer.train_step(samples)
File "/media/lavid/Data/Samuel/fairseq/fairseq/trainer.py", line 306, in train_step
ignore_grad
File "/media/lavid/Data/Samuel/fairseq/fairseq/tasks/fairseq_task.py", line 251, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "/home/lavid/anaconda3/envs/samuel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/media/lavid/Data/Samuel/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 56, in forward
net_output = model(**sample['net_input'])
File "/home/lavid/anaconda3/envs/samuel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/media/lavid/Data/Samuel/fairseq/fairseq/models/fairseq_model.py", line 372, in forward
return self.decoder(src_tokens, **kwargs)
File "/home/lavid/anaconda3/envs/samuel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'src_lengths'
The parameters passed to fairseq-train
were the following:
train() {
fairseq-train $OUTPUT/preprocessed \
--task language_modeling \
--save-dir $OUTPUT/checkpoints \
--arch lightconv_lm \
--clip-norm 0 \
--optimizer adam \
--lr 0.0005 \
--max-tokens 4000 \
--max-target-positions 1024 \
--min-lr '1e-09' \
--weight-decay 0.0001 \
--criterion label_smoothed_cross_entropy \
--label-smoothing 0.1 \
--lr-scheduler inverse_sqrt \
--ddp-backend=no_c10d \
--max-update 50000 \
--warmup-updates 4000 \
--warmup-init-lr '1e-07' \
--adam-betas '(0.9, 0.98)' \
--input-dropout 0.3 \
--attention-dropout 0.1 \
--weight-dropout 0.1 \
--decoder-glu 1
}
I'm currently on commit 9398a28, but the same error happens on the 0.8.0 release.
Metadata
Assignees
Labels
No labels