Keyphrase Generation with Transformer

This repo contains the source code for keyphrase generation with Transformer architecture. Based on the paper entitled "Keyphrase Generation with Cross-Document Attention". Our implementation is built on the source code from OpenNMT-kpg-release and OpenNMT-py.

CDKGen is a Transformer-based keyphrase generator, which expands the Transformer to global attention with cross-document attention networks to incorporate available documents as references so as to generate better keyphrases with the guidance of topic information. On top of the proposed Transformer + cross-document attention architecture, we also adopt a copy mechanism to enhance our model via selecting appropriate words from documents to deal with out-of-vocabulary words in keyphrases. The structure of CDKGen is illustrated in the figure below.

Citation

If you use or extend our work, please cite the following paper:

@article{Sinovation2020CDKGen,
  title="{Keyphrase Generation with Cross-Document Attention}",
  author={Shizhe Diao, Yan Song, Tong Zhang},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.09800}
}

Dependencies

python 3.6

pytorch 1.1

torchtext 0.4 (important)

sentence-transformers 0.2.4

Datasets

All datasets used in this paper are provided by Rui Meng and they can be downloaded here.

Preprocess the data

python preprocess.py -config config/preprocess/config-preprocess-keyphrase-kp20k.yml

Training

python train.py -config config/train/config-transformer-keyphrase-memory.yml

Testing

python kp_gen_eval.py -tasks pred eval report -config config/test/config-test-keyphrase-one2seq.yml -data_dir data/keyphrase/meng17/ -ckpt_dir ./models/kp20k/ -output_dir output/cdkgen/ -testsets duc inspec semeval krapivin nus kp20k -gpu 0 --verbose --beam_size 10 --batch_size 32 --max_length 40 --onepass --beam_terminate topbeam --eval_topbeam

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
config		config
images		images
models		models
onmt		onmt
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
do_experiment.sh		do_experiment.sh
gen_loss_figure.py		gen_loss_figure.py
kp_data_converter.py		kp_data_converter.py
kp_evaluate.py		kp_evaluate.py
kp_gen_eval.py		kp_gen_eval.py
kp_generate.py		kp_generate.py
preprocess.py		preprocess.py
reformat.py		reformat.py
reformat_conv.py		reformat_conv.py
reformat_output.py		reformat_output.py
requirements.opt.txt		requirements.opt.txt
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyphrase Generation with Transformer

Citation

Dependencies

Datasets

Preprocess the data

Training

Testing

About

Releases

Packages

Contributors 2

Languages

License

sedrickkeh/Keyphrase_Generation_Social_Media

Folders and files

Latest commit

History

Repository files navigation

Keyphrase Generation with Transformer

Citation

Dependencies

Datasets

Preprocess the data

Training

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages