Skip to content

Latest commit

 

History

History

Flowformer_NLP

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Flowformer for Language Modeling

We follow the official code base of [fairseq] and implement Flowformer upon that repo.

Since fairseq a quite large code base, we only provide the changed module and our experimental configuration. You can incorporate flow_attention.py to fairseq for reproduction.



Figure 1. Results on Wikitext-103.

Get Started

  1. Solve the environment and download the dataset follows the tutorial of [Language Modeling].
  2. Replace the ./fairseq/modules/multihead_attention.py by our provided flow_attention.py.
  3. Train and evaluate the model by the following scripts. You can get the pretrained model from [here].
fairseq-train --task language_modeling \
  data-bin/wikitext-103 \
  --save-dir checkpoints/flowformer \
  --arch transformer_lm --share-decoder-input-output-embed \
  --dropout 0.1 \
  --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0 \
  --lr 0.001 --lr-scheduler inverse_sqrt --warmup-updates 6000 --warmup-init-lr 1e-07 \
  --tokens-per-sample 512 --sample-break-mode none \
  --max-tokens 2048 --update-freq 16 \
  --max-update 150000

fairseq-eval-lm data-bin/wikitext-103 \
    --path checkpoints/flowformer/checkpoint_best.pt \
    --batch-size 2 \
    --tokens-per-sample 512 \
    --context-window 400

Acknowledgement

We code base is built upon on the official code of fairseq:

https://github.com/facebookresearch/fairseq