DM-Map-Reduce-Program

Overview: This program reads from a CSV file that contains tweets, in this case the local csv file contains Trump tweets (over 7000 tweets), and calculates the frequency of words that are used in all the tweets using a custom map reduce algorithm. There are two ways the algorithm counts the words: sequentially and in parallel (parallel method involves multiple cores performing the tasks). The user can choose which method to count the words.

Language used: Python for both sequential and distributed

Distributed Solution: I used multiprocessing for the parallel solution which involved importing the module and using the pool class to map the sublists. I used pool.imap_unordered instead of pool.map since it makes the processes independant and doesn't wait for the other processes' results to execute which makes it slightly faster than pool.map

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
DM_Assignment_2		DM_Assignment_2
parallel-map-reduce		parallel-map-reduce
.gitattributes		.gitattributes
.gitignore		.gitignore
DM_Assignment_2.sln		DM_Assignment_2.sln
DM_Assignment_2_Output.pdf		DM_Assignment_2_Output.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DM-Map-Reduce-Program

User Flow: User Selects Counting Method

User Flow: User Selects Sequential Method

User Flow: User Selects Parallel Method

About

Uh oh!

Releases

Packages

Languages

AlexanderVhd/DM-Map-Reduce-Program

Folders and files

Latest commit

History

Repository files navigation

DM-Map-Reduce-Program

User Flow: User Selects Counting Method

User Flow: User Selects Sequential Method

User Flow: User Selects Parallel Method

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages