Code for ACL 2021 paper "Controllable Open-ended Question Generation with A New Question Type Ontology".
Our Yahoo dataset is based on the Yahoo Answer L6 dataset. After obtaining the license for the L6 dataset, please email Shuyang (caoshuy@umich.edu) with the proof of license attached to obtain the Yahoo dataset.
Preprocessed binarized Reddit data can be downloaded from here.
For data preprocessing, please refer to the README in data_preprocess.
Our experiments are based on PyTorch 1.7.0
and Fairseq at commit 0db28cd
. Newer versions of Fairseq might also work.
Please download the generation models from here
and put them under $MODEL/generation_models
. The binarized dataset should be under $DATA/binarized_data
.
To convert the fairseq generation output to text, use convert_output.py
:
python convert_output.py --generate-dir <result_dir>
cd gen_scripts
./jointgen.sh $DATA/output/jointgen
cd gen_scripts
./explgen.sh $DATA/output/explgen
cd gen_scripts
./tplgen_question_generation.sh $DATA/output/tplgen_question
cd gen_scripts
./explgen_9types.sh $DATA/output/explgen_9types
cd gen_scripts
./tplgen_question_generation_9types.sh $DATA/output/tplgen_question_9types
Please set BART_PATH
as the path to the bart.large
model, which can be downloaded here.
export BART_PATH=<path_to_bart_large_dir>/model.pt
cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./jointgen.sh $BART_PATH $MODEL/jointgen
cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./explgen.sh $BART_PATH $MODEL/explgen
cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./tplgen_template_generation.sh $BART_PATH $MODEL/tplgen_template_generation
cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./tplgen_question_generation.sh $BART_PATH $MODEL/tplgen_question_generation