This is the code repo for the UIST 2019 paper Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks. It includes the network structure, the training/testing/deploy files, and the data-processing files.
To get the training data for the network, we used the CoNLL correction tasks data, year 13-14. You can go to DataProcess/CoNLL
to check out the related processing code.
We also used the Yelp+Amazon review data, you can find it here (related github project: https://github.com/nhviet1009/Character-level-cnn-pytorch). For those datasets, because they're usually good text without errors, we performed perturbation (injecting errors). The details could be found under DataProcess/PerturbNormalDataset
The training format of the data is provided in the example file under DataProcess
. Each training example is composed of text with errors plus the correction, and the expected output. For the output format, please refer to our paper for more details.
python Train.py --train small_train_data5_amazon --batch_size 128 --test small_train_data5_amazon --elr 1e-4 --dlr 5e-4 --epochs 20 -teacher 0.5 --test_freq 1 --dropout 0.2 --clip 10 --hidden_size 300 --load_en best_en_5out_amazon.pth --only_lowercase 1
small_train_data5_amazon
is our training data file; elr/dlr
is encoder/decoder learning rate; teacher
is the teacher rate.
You can find our trained model and the training data here
We also provided a script for you to deploy this correction algorithm on servers. You can use HTTP protocol to make requests & responses.
Please install numpy
, nltk
, symspellpy
, Beautiful Soup 4
for data processing
And the Neural Network file uses Pytorch version 0.4.1
.
If you use the code in your paper, then please cite it as:
@inproceedings{Zhang:2019:TCI:3332165.3347924,
author = {Zhang, Mingrui Ray and Wen, He and Wobbrock, Jacob O.},
title = {Type, Then Correct\&\#58; Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks},
booktitle = {Proceedings of the 32Nd Annual ACM Symposium on User Interface Software and Technology},
series = {UIST '19},
year = {2019},
isbn = {978-1-4503-6816-2},
location = {New Orleans, LA, USA},
pages = {843--855},
numpages = {13},
url = {http://doi.acm.org/10.1145/3332165.3347924},
doi = {10.1145/3332165.3347924},
acmid = {3347924},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {gestures, natural language processing, text editing, touch},
}