The code for the paper Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders on ACL2018
Cite this paper:
@inproceedings{wang2018learning,
title={Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders},
author={Wang, Yansen and Liu, Chenyi and Huang, Minlie and Nie, Liqiang},
booktitle={ACL},
year={2018}
}
You can download our data here. Press ctrl+f and search for "Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders" and you can find the link to our dataset.
IMPORTANT NOTE: Our code is not compatible with new versions of tensorflow, so please use tensorflow-1.0.0 to run our codes.
Command python main.py {--[option1]=[value1] --[option2]=[value2] ... }
Options(=[default_value]):
--is_train=TrueSet to True for training and False from inference.--symbols=20000Size of vocabulary.--embed_units=100Size of word embedding.--units=512Size of each model layer.--layers=4Number of layers in the model.--batch_size=50Batch size to use during training.--data_dir=./dataData directory.--train_dir=./trainTraining directory.--per_checkpoint=1000How many steps to do per checkpoint.--check_version=0The version for continuing training or for inferencing. Set to 0 if you don't want to continue from an existed checkpoint.--log_parameters=TrueSet to True to show the parameters.--inference_path=""Set filename of inference, empty for screen input.--PMI_path=./PMIPMI directory.--keywords_per_sentence=20How many keywords will be included. We don't need to set this flag in STD.--question_data=True(Deprecated, please set to True) An unused option in the final version.
The file train.sh and infer.sh contain example commands for training and inference. You can use them with the sh command.
Command python main.py {--[option1]=[value1] --[option2]=[value2] ... }
Options(=[default_value]):
--is_train=TrueSet to True for training and False from inference.--symbols=20000Size of vocabulary.--embed_units=100Size of word embedding.--units=512Size of each model layer.--layers=4Number of layers in the model.--batch_size=50Batch size to use during training. Please set to 1 during inference or the PMI mechanism can't work properly.--data_dir=./dataData directory.--train_dir=./trainTraining directory.--per_checkpoint=1000How many steps to do per checkpoint.--check_version=0The version for continuing training or for inferencing.--log_parameters=TrueSet to True to show the parameters.--inference_path=""Set filename of inference, empty for screen input.--PMI_path=./PMIPMI directory.--keywords_per_sentence=20How many keywords will be included.--question_data=True(Deprecated, please set to True) An unused option in the final version.
The file train.sh and infer.sh contain example commands for training and inference. You can use them with the sh command.
We're sorry that due to our regulations, we can't share the word vectors pretrained. You can make your own "vector.txt" in this format:
[word1] 1.0 -2.0 5.0
[word2] 3.14 2.72 -1.41
[word3] 0.86 -1.71 -0.04
... ...
and set --embed_units==[vector dimension]. In this case, you should set --embed_units==3
Here's part of our word vectors:
冉津 -0.007428 -0.018109 0.017502 0.127934 0.090787 -0.008699 -0.181448 -0.117719 -0.130669 0.007109 -0.048784 -0.083871 -0.041926 -0.016476 0.026685 -0.094259 -0.097639 0.049795 0.077781 -0.027308 -0.000205 0.117830 -0.033821 -0.088984 0.150127 -0.065157 0.018675 -0.105137 0.001134 -0.026754 0.026742 -0.127951 -0.006684 -0.080394 0.003453 -0.031691 -0.013896 0.051936 0.034658 0.079686 0.026027 0.130313 0.011976 -0.154662 -0.065610 0.079444 -0.036182 -0.042820 0.040647 -0.009277 -0.094344 0.352311 -0.100773 -0.167505 -0.071562 0.182705 0.087977 -0.077308 0.121469 -0.076466 0.045806 0.029080 -0.120310 0.112574 0.027545 0.130245 0.060847 -0.087550 -0.072264 -0.061106 0.045996 -0.048654 0.036791 -0.324380 -0.129975 -0.151802 0.055080 0.108745 0.072554 0.063584 -0.183879 -0.088556 -0.189840 -0.028041 -0.130920 -0.110319 -0.043854 -0.124681 0.027615 -0.096786 0.024738 -0.112449 -0.041501 -0.016814 -0.026927 0.213262 0.127977 -0.085883 -0.056919 0.074451
冉徽 0.014493 -0.009604 -0.056103 0.137076 0.136810 0.003288 -0.162282 -0.142987 -0.111230 -0.007172 -0.036456 -0.059875 -0.034977 -0.000799 0.010098 -0.087427 -0.089052 0.052306 0.095106 -0.078993 -0.038151 0.072410 -0.069268 -0.057892 0.117272 -0.029470 0.013380 -0.051824 -0.039586 -0.041293 0.059040 -0.148370 -0.015987 -0.074139 0.048661 -0.056333 0.022390 0.077231 -0.010541 0.071275 0.015923 0.151031 0.013858 -0.166912 -0.053901 0.057671 -0.070033 -0.044730 0.011594 0.016944 -0.148096 0.327251 -0.109722 -0.195073 -0.074526 0.209270 0.096594 -0.008418 0.120976 -0.057380 0.039540 0.050772 -0.150347 0.127315 -0.023129 0.164845 0.086893 -0.053719 -0.042148 -0.030370 0.064161 -0.070620 0.031359 -0.297059 -0.092481 -0.101616 0.105090 0.139352 0.058642 0.080823 -0.226540 -0.081144 -0.161620 -0.055791 -0.109781 -0.082259 -0.023754 -0.115139 0.023207 -0.117227 0.025099 -0.098476 -0.039537 0.056101 0.011074 0.201935 0.127134 -0.081476 -0.025416 0.024106
冉红平 0.001560 -0.005889 0.025941 0.063590 0.079942 0.007259 -0.176020 -0.105751 -0.107272 0.005988 -0.078503 -0.030769 -0.029349 -0.039878 0.007160 -0.075574 -0.121881 0.030458 0.070573 -0.030429 -0.009549 0.063056 -0.024280 -0.122451 0.073607 -0.017913 0.002592 -0.099109 0.039369 -0.054562 0.044947 -0.135777 -0.023722 -0.065398 0.039630 -0.058899 -0.034931 0.051255 0.051398 0.016336 -0.003559 0.133971 0.088922 -0.220131 0.006107 0.022170 -0.056472 -0.061360 0.019423 0.018444 -0.161037 0.362732 -0.118108 -0.157995 -0.071416 0.118341 0.083489 -0.036985 0.103561 -0.086170 0.029961 0.045517 -0.165905 0.122532 0.004158 0.116590 0.024232 -0.052038 -0.053199 -0.038042 0.089462 -0.087992 0.033044 -0.303940 -0.160299 -0.150656 0.062613 0.156578 0.015454 0.124571 -0.198247 -0.060708 -0.183756 -0.080255 -0.135093 -0.155833 0.029361 -0.091097 0.032860 -0.103119 0.099081 -0.114630 -0.045474 0.023771 -0.044274 0.185795 0.088140 -0.072055 -0.031876 0.074563
......
If you do not have pretrained vectors, just leave a blank vector.txt file for the program to load. The program will automatically initialize all the word vectors for those not appeared in this file.