A distributed chinese n-gram language model implementation for train and test on large corpus , using Hadoop MapReduce.
-
Updated
Dec 28, 2018 - Java
A distributed chinese n-gram language model implementation for train and test on large corpus , using Hadoop MapReduce.
Analyzes text and gathers data into n-grams. Gives the top x n!-grams. Supports mySQL databases.
Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).
"Advanced Data Structures and Algorithms" Course project
Add a description, image, and links to the n-gram topic page so that developers can more easily learn about it.
To associate your repository with the n-gram topic, visit your repo's landing page and select "manage topics."