This project demonstrates printing out word count using Hadoop/Dataproc. The objective is to extend the Python mapper and reducer functions to not only count individual words (1-grams) but also to identify and count bigrams (2-grams) and trigrams (3-grams).
N-grams are a contiguous sequence of n
elements extracted from a given sequence of text or speech.