Skip to content

This project demonstrates printing out word count using Hadoop/Dataproc. The objective is to extend the Python mapper and reducer functions to not only count individual words (1-grams) but also to identify and count bigrams (2-grams) and trigrams (3-grams).

Notifications You must be signed in to change notification settings

chandanasy/NgramAnalytics

Repository files navigation

N-Gram Analysis Project

Project Description

This project demonstrates printing out word count using Hadoop/Dataproc. The objective is to extend the Python mapper and reducer functions to not only count individual words (1-grams) but also to identify and count bigrams (2-grams) and trigrams (3-grams).

What are N-Grams?

N-grams are a contiguous sequence of n elements extracted from a given sequence of text or speech.

About

This project demonstrates printing out word count using Hadoop/Dataproc. The objective is to extend the Python mapper and reducer functions to not only count individual words (1-grams) but also to identify and count bigrams (2-grams) and trigrams (3-grams).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages