Skip to content

Latest commit

 

History

History
112 lines (101 loc) · 8.61 KB

README.md

File metadata and controls

112 lines (101 loc) · 8.61 KB

Twist Decoding: Diverse Generators Guide Each Other

twist_decoding

Introduction

Many language generation models have different settings, such as vocabularies, tokenization, and generation order, so they can't be simply ensembled. Our Twist decoding combines models regardless of such differences without any additional training or finetuning.

Installation

We forked the fairseq library and incorporated distance terms to their beam implementation. You can incorporate this in any implementation of beam search, but here we provide the codebase that we used for our paper. To run experiments, follow the fairseq instructions and run in this repository:

cd fairseq
pip install --editable .
python setup.py build_ext --inplace

Download Our Models and Data

Any fairseq seq-to-seq model should work, but here we provide all models we used in our experiments. See our paper for the training details.

Models
DE-EN Generic1 DE-EN Medicine DE-EN Law DE-EN Koran DE-EN Subtitles
ZH-EN L2R ZH-EN R2L EN-DE L2R EN-DE R2L
SciTLDR Abstract2 SciTLDR AIC2

1: WMT19 top-performing model. Downloaded from the fairseq repository.
2: Downloaded from the official repository of the SciTLDR dataset (Cachola et al., 2020).

Datasets
DE-EN Medicine3 DE-EN Law3 DE-EN Koran3 EN-DE Subtitles3 WMT20 ZH-EN4 WMT20 EN-DE4

3: Downloaded from the official repository of Hu et al. (2019).
4: Downloaded from the official repository of the bidimensional leaderboards (Kasai et al., 2022).

Decode Domain and Generic Models

Here are some example commands. Run Twist decoding with f=Domain and g=Generic in the medical domain. They are separated by a colon in options: f:g. Run Moses detokenization after.

cd fairseq/
python twist/generate_twist.py --model-dirs  <PATH>/trans-base_medicine-de-en/:<PATH>/wmt19.de-en.joined-dict/ --model-names model.pt:model.pt --out-file mt/domains/medicine/output/test.twist --r2l 0:0 --src-lang de --tgt-lang en --in-file mt/domains/medicine/src/emea-test.tok.de --batch-size 20 --max-updates 3 --lmd-g 0.3 --lmd-f 0.1
perl <PATH>/mosesdecoder/scripts/tokenizer/detokenizer.perl -l en < mt/domains/medicine/output/test.twist_update-2.out > mt/domains/medicine/output/test.twist_update-2.txt

Run Twist decoding with f=Generic and g=Domain in the legal domain.

python twist/generate_twist.py --model-dirs  <PATH>/wmt19.de-en.joined-dict/:<PATH>/trans-base_law-de-en/ --model-names model.pt:model.pt --out-file mt/domains/law/output/test.twist --r2l 0:0 --src-lang de --tgt-lang en --in-file mt/domains/law/src/acquis-test.tok.de --batch-size 20 --max-updates 3 --lmd-g 3.0 --lmd-f 0.1

Run the reranking baseline.

python twist/generate_rerank.py --model-dirs  <PATH>/trans-base_medicine-de-en/:<PATH>/wmt19.de-en.joined-dict/ --model-names model.pt:model.pt --out-file mt/domains/medicine/output/test.rerank.out --r2l 0:0 --src-lang de --tgt-lang en --in-file mt/domains/medicine/src/emea-test.tok.de --batch-size 20

Decode Left-to-Right and Right-to-Left Models

The command is similar, but we pass the --r2l option.

python twist/generate_twist.py  --model-dirs <PATH>/trans-large-r2l_wmt20-zh-en/:<PATH>/trans-large-l2r_wmt20-zh-en/ --model-names model.pt:model.pt --out-file mt/wmt/zh-en/output/test.twist --r2l 1:0 --src-lang zh --tgt-lang en --in-file mt/wmt/zh-en/src/newstest2020.zh-en.src.tok.zh --max-updates 3 --lmd-g 3.0 --lmd-f 0.1 --batch-size 20

Paper Summarization

Here are some example commands. Run Twist decoding with f=AIC (abstract, introduction, and conclusion) and g=Abstract.

python twist/generate_twist_tldr.py --checkpoint-dirs <PATH>/scitldr_catts-xsum.tldr-aic/:<PATH>/scitldr_bart.tldr-ao/ --data-dirs summ/scitldr/SciTLDR-AIC/ctrl:summ/scitldr/SciTLDR-A/ctrl --checkpoint-files scitldr_catts-xsum.tldr-aic.pt:scitldr_bart.tldr-ao.pt --max-updates 3 --batch-size 1 --split test --beam 5 --lmd-g 3.0 --lmd-f 0.3 --batch-size 1 --out-file summ/scitldr/output/test.twist

Run the reranking baseline.

python twist/generate_rerank_tldr.py --checkpoint-dirs <PATH>/scitldr_catts-xsum.tldr-aic/:<PATH>/scitldr_bart.tldr-ao --data-dirs summ/scitldr/SciTLDR-AIC/ctrl:summ/scitldr/SciTLDR-A/ctrl --checkpoint-files scitldr_catts-xsum.tldr-aic.pt:scitldr_bart.tldr-ao.pt --batch-size 1 --split test --beam 5 --batch-size 1 --out-file summ/scitldr/output/test.rerank.txt

Evaluate Results

Lastly, we provide tools for evaluations: COMET for machine translation and ROUGE for summarization. Use the sacrebleu library to measure the BLEU score. For example,

cd eval/COMET/
bash run.sh  ../../fairseq/mt/domains/medicine/src/emea-test.de ../../fairseq/mt/domains/medicine/output/test.twist_update-2.txt ../../fairseq/mt/domains/medicine/tgt/emea-test.en.jsonl ../../fairseq/mt/domains/medicine/output/test.twist_update-2.comet
cd fairseq/
sacrebleu mt/domains/medicine/tgt/emea-test.en -i mt/domains/medicine/output/test.twist_update-2.txt -m bleu -b -w 4 -l de-en
cd eval/ROUGE/
bash run.sh  ../../fairseq/summ/scitldr/output/test.twist_update-2.txt  ../../fairseq/summ/scitldr/output/test.twist_update-2.txt    ../../fairseq/summ/scitldr/tgt/test_refs.jsonl   ../../fairseq/summ/scitldr/output/test.twist_update-2.rougeL rougeL

Citation

@misc{kasai2022twist,
  author    = {Jungo Kasai and
               Keisuke Sakaguchi and
               Ronan Le Bras and
               Hao Peng and
               Ximing Lu and
               Dragomir Radev and
               Yejin Choi and
               Noah A. Smith},
  title     = {Twist Decoding: Diverse Generators Guide Each Other},
  year      = {2022},
  url       = {https://arxiv.org/abs/2205.09273},
}

UWNLP Logo             AI2 Logo