NextWord

Next word is a Shiny app that predicts the next word a user will type. It uses NLP to build the prediction model file and backoff algorithm to search for the next word. Link to see the app in action: https://noeltemena.shinyapps.io/ShinyWord/

There are 6 files on this repository:

NextWord_documentation.html: an HTML file with no R codes that describes steps on efficiently cleaning data files and building an ngram model without running out of memeory. This also includes what function/package/algorithm to use for quick word search. Click to read: https://cdn.rawgit.com/ntemena720/NextWord/4a1d0960/NextWord_documentation.html
ngram.html: exploratory data analysis done on the data source used for the next word app. This document was done before I performed NLP on the dataset. Click to read https://rpubs.com/noeltemena/ngram
1_cleandata.R: source file for cleaning the source files from a collection of blogs,tweets and news. I purposesly not used the dplyr pipe on Cleanfile function to prevent memory crash while processing millions of data row.
2_Tokenize.R: source files for building the ngram model files
3_mergetables.R: source file for merging ngram model files. Merge function inlduces adding word frequency from the intersect word from the 2 word data frame.
4_optimize_table.R: source file for combining same ngram files from the 3 data source files (blog,news and tweets).
5_searchword.R: This is kind of a sandbox file to test and determine the fastest nextword query function between dplyr filter & sqldf.
app.R: source file for Shiny app using sqldf to search the next "predicted" word.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NextWord

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
1_Cleandata.R		1_Cleandata.R
2_Tokenize.R		2_Tokenize.R
3_mergetables.R		3_mergetables.R
4_optimize_table.R		4_optimize_table.R
5_searchword.R		5_searchword.R
NextWord_documentation.html		NextWord_documentation.html
README.md		README.md
app.R		app.R
nextword.PNG		nextword.PNG
ngram.html		ngram.html

ntemena720/PredictNextWord

Folders and files

Latest commit

History

Repository files navigation

NextWord

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages