- the data is semi-preprocessed using the file data_preprocessing.ipynb
- further preprocessing and training is done in preprocess_and_train_colab.ipynb
Run it on virtual machine, by default it will train on 100000 steps, decrease them and set parameters using paper1.pdf. Some preprocessing is need to be done with StanfordNLP to enhance the model's performace(read paper2.pdf...it contains some more details of preprocessing).
Link to help with training: http://opennmt.net/OpenNMT-py/options/train.html#Model-%20Encoder-Decoder