Skip to content

Can't train LightConv language model #1439

Closed
@samueldemoura

Description

Hi, I'm having some trouble training a Light/DynamicConv language model. At first, a parameter was missing and I would get an exception while trying to build the model, but applying the patch from #536 fixed that.

Now, the model gets built and the layers and parameter count gets printed out to the screen, but immediately in the first training iteration this happens:

| model lightconv_lm, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 19034112 (num. trained: 19034112)
| training on 1 GPUs
| max tokens per GPU = 4000 and max sentences per GPU = None
| no existing checkpoint found ./model/03_lightconv/checkpoints/checkpoint_last.pt
| loading train data for epoch 0
| loaded 1716 examples from: ./model/03_lightconv/preprocessed/train
| epoch 001:   0%|                                                                    | 0/6384 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/lavid/anaconda3/envs/samuel/bin/fairseq-train", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
  File "/media/lavid/Data/Samuel/fairseq/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/media/lavid/Data/Samuel/fairseq/fairseq_cli/train.py", line 86, in main
    train(args, trainer, task, epoch_itr)
  File "/media/lavid/Data/Samuel/fairseq/fairseq_cli/train.py", line 127, in train
    log_output = trainer.train_step(samples)
  File "/media/lavid/Data/Samuel/fairseq/fairseq/trainer.py", line 306, in train_step
    ignore_grad
  File "/media/lavid/Data/Samuel/fairseq/fairseq/tasks/fairseq_task.py", line 251, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/home/lavid/anaconda3/envs/samuel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/lavid/Data/Samuel/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 56, in forward
    net_output = model(**sample['net_input'])
  File "/home/lavid/anaconda3/envs/samuel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/lavid/Data/Samuel/fairseq/fairseq/models/fairseq_model.py", line 372, in forward
    return self.decoder(src_tokens, **kwargs)
  File "/home/lavid/anaconda3/envs/samuel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'src_lengths'

The parameters passed to fairseq-train were the following:

train() {
    fairseq-train $OUTPUT/preprocessed \
        --task language_modeling \
        --save-dir $OUTPUT/checkpoints \
        --arch lightconv_lm \
        --clip-norm 0 \
        --optimizer adam \
        --lr 0.0005 \
        --max-tokens 4000 \
        --max-target-positions 1024 \
        --min-lr '1e-09' \
        --weight-decay 0.0001 \
        --criterion label_smoothed_cross_entropy \
        --label-smoothing 0.1 \
        --lr-scheduler inverse_sqrt \
        --ddp-backend=no_c10d \
        --max-update 50000 \
        --warmup-updates 4000 \
        --warmup-init-lr '1e-07' \
        --adam-betas '(0.9, 0.98)' \
        --input-dropout 0.3 \
        --attention-dropout 0.1 \
        --weight-dropout 0.1 \
        --decoder-glu 1
}

I'm currently on commit 9398a28, but the same error happens on the 0.8.0 release.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions