This repository contains the code needed to run the experiments presented in the paper Effective Estimation of Deep Generative Language Models [1].
To start experimenting, clone the repository to your local device and install the following dependencies:
- python >= 3.6
- pip install -r requirements.txt
- hyperspherical_vae: the code was tested with this fork, get the latest version from here.
- torch_two_sample
- pyter
- Download and pre-process the Penn Treebank data, see the data folder.
- Train a RNNLM:
./main.py --model deterministic --mode train --pre_def 1 --ptb_type mik
- Train a default SenVAE:
./main.py --model bowman --mode train --pre_def 1 --ptb_type mik
- Train a SenVAE with a target rate of 5, using MDR:
./main.py --model bowman --mode train --pre_def 1 --ptb_type mik --lagrangian 1 --min_rate 5 --save_suffix mdr
- Train a SenVAE with MoG prior and a target rate of 5, using MDR:
./main.py --model flowbowman --prior mog --mode train --pre_def 1 --ptb_type mik --lagrangian 1 --min_rate 5 --save_suffix mog
- Evaluate the models:
./main.py --model deterministic --mode test --pre_def 1 --ptb_type mik
./main.py --model bowman --mode test --pre_def 1 --ptb_type mik
./main.py --model bowman --mode test --pre_def 1 --ptb_type mik --lagrangian 1 --min_rate 5 --save_suffix mdr
./main.py --model flowbowman --prior mog --mode test --pre_def 1 --ptb_type mik --lagrangian 1 --min_rate 5 --save_suffix mog
- Print some samples:
./main.py --model deterministic --mode qualitative --pre_def 1 --ptb_type mik
./main.py --model bowman --mode qualitative --pre_def 1 --ptb_type mik
./main.py --model bowman --mode qualitative --pre_def 1 --ptb_type mik --lagrangian 1 --min_rate 5 --save_suffix mdr
./main.py --model flowbowman --prior mog --mode qualitative --pre_def 1 --ptb_type mik --lagrangian 1 --min_rate 5 --save_suffix mog
We used a fork of the vae-lagging-encoder
repo to run the experiments on Yahoo and Yelp data.
Please use the submodule to recreate the experiments, it has been modified for training with MDR and the MoG prior.
- main.py: the main script that handles all command line arguments.
- dataset: expected location of data. Contains code for preprocessing and batching PTB.
- model: all components for the various models tested in the paper.
- scripts: various scripts for training/testing/BayesOpt, etcetera.
- util: utility functions for storage, evaluation and more.
There are many command line settings available to tweak the experimental setup. Please see the settings file for a complete overview. Here, we will highlight the most important settings:
--script: [generative|bayesopt|grid] chooses which script to run. generative is used for training/testing a single model, bayesopt and grid run Bayesian Optimization and Grid search respectively. Please see the scripts for more information about their usage.
--mode: [train|test|novelty|qualitative] select in which mode to run the generative script.
--save_suffix: to give your model a name.
--seed: set a random seed.
--model: [deterministic|bowman|flowbowman] the model to use. Deterministic refers to the RNNLM, bowman to the SenVAE and flowbowman to the SenVAE with expressive latent structure.
--lagrangian: set to 1 to use the MDR objective.
--min_rate: specify a minimum rate, in nats.
--flow: [diag|iaf|vpiaf|planar] the type of flow to use with the flowbowman model.
--prior: [weak|mog|vamp] the type of prior to use with the flowbowman model.
--data_folder: path to your pre-processed data.
--out_folder: path to store experiments.
--ptb_type: [|mik|dyer] choose between simple (mik) and expressive (dyer) unked PTB. paths to out and data are set automatically.
--pre_def: set to 1 to use encoder-decoder hyperparameters that match the ones in the paper.
--local_rank: which GPU to use. Set to -1 to run on CPU.
If you use this code in your project, please cite:
[1] Pelsmaeker, T., and Aziz, W. (2019). Effective Estimation of Deep Generative Language Models. arXiv preprint arXiv:1904.08194.
BibTeX format:
@article{effective2019pelsmaeker,
title={Effective Estimation of Deep Generative Language Models},
author={Pelsmaeker, Tom and
Aziz, Wilker},
journal={arXiv preprint arXiv:1904.08194},
year={2019}
}
MIT