Tensorflow implementation of https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf
Training with full SQuAD dataset is currently a work in progress.
The dataset used for this task is Stanford Question Answering Dataset (https://rajpurkar.github.io/SQuAD-explorer/). Pretrained GloVe embeddings are used for both words (https://nlp.stanford.edu/projects/glove/) and characters (https://github.com/minimaxir/char-embeddings/blob/master/glove.840B.300d-char.txt).
- NumPy
- tqdm
- TensorFlow == 1.2
Once you clone this repo, run the following lines from bash just once to process the dataset (SQuAD).
$ pip install -r requirements.txt
$ bash setup.sh
$ python process.py --process True
You can change the hyperparameters from params.py file. To train the model, run the following line.
$ python model.py
Run tensorboard for visualisation.
$ tensorboard --logdir=r-net:train/
As a sanity check I trained the network with 3000 independent randomly sampled question-answering pairs. With my GTX 1080, it took about 4 hours and a half for the model to get the gist of what's going on with the data. With full dataset (90,000+ pairs) we are expecting longer time for convergence.
Some sort of normalization method might help speed up convergence (though the authors of the original paper didn't mention anything about the normalization).