Skip to content

Code for PENS: A Dataset and Generic Framework for Personalized News Headline Generation

Notifications You must be signed in to change notification settings

LLluoling/PENS-Personalized-News-Headline-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PENS - ACL2021

{PENS}: A Dataset and Generic Framework for Personalized News Headline Generation

This is a Pytorch implementation of PENS.

I. Guidance

0. Enviroment

  • Install pytorch version >= '1.4.0'
  • Install the pensmodule package under ''PENS-Personalized-News-Headline-Generation'' using code pip install -e .

1. Data Prepare

  • Download the PENS dataset here and put the dataset under data/.
  • (optional) Download glove.840B.300d.txt under data/ if you choose to use pretrained glove word embeddings.

2. Running Code

  • cd pensmodule
  • Follow the order: Preprocess --> UserEncoder --> Generator and run the pipeline**.ipynb notebook to preprocess, train the user encoder and the train generator, individually.

More infor please refer to the homepage of the introduction of PENS dataset.

II. Training Tips

Here we take NRMS as user encoder, the followings are some experiment detailes that are not illustrated in the paper.

0. TIPS

  • In this paper, we used mento carlo search for RL training, which is very slow in training and sometimes hard to converge. Thus we provide ac training in this provided code.
  • If you pretrain the generator for a couple of epoches, you should set a very small learning rate during RL training.
  • Large improvements can be made compared with the baselines that we provided, the importance always lies in the design of reward functions.

1. Training Reward

image info

2. Test performance on different training steps

image info image info image info

3. Cases

epoch generated headline
Case 1
1000 top stockton news arrests 2 impaired drivers
5000 top stockton news arrests 2 impaired drivers who had unrestrained children in their cars
Case 2
1000 trump says tens of thousands of people couldn t get in 2020 rally
5000 trump says tens of thousands of people outside his 2020 campaign rally at orlando

Noted:

  • With the training process goes, the generated sentences are more fluent and contains more rich information.
  • Rouge scores is not the best evaluation scores, but a compromising choice. Of course the best evaluation is to check out the real clicks of users to see if they are more interested. Thus sometimes a more fluent and human-like generated sentence gets lower rouge scores.

About

Code for PENS: A Dataset and Generic Framework for Personalized News Headline Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published