Hadoop-MapReduce-Anagram-Solver

The implementation consists of a program that utilizes the Hadoop Map-Reduce framework to identify the anagrams of the words of a file.

Author: Nikolas Petrou, MSc in Data Science

But what is an anagram?

An anagram is a word or phrase formed by rearranging the letters of a different word, by using all the original characters/letters exactly once.

For example:

Refills→fillers
Relayed→layered
Rentals→antlers
Rebuild→builder

Data

Specifically this task focuses on finding the anagrams of the words of the following file: https://raw.githubusercontent.com/pmichaud/rpbench/master/files/unixdict.txt

You can download & upload the aforementioned UNIX dictionary file to your own HDFS filesystem using the following commands:

wget https://raw.githubusercontent.com/pmichaud/rpbench/master/files/unixdict.txt
hadoop fs -copyFromLocal unixdict.txt filename_of_input_file

Implementation

Examples of desired output:

2 hasn't,shan't
2 cascara,caracas
2 ramada,armada

The main idea of this problem's solution is to use the same Key for every word that can be rearranged together. Thus, the ideal Key for each read word to use during the mapping phase, is a Text object with the sorted letters-characters (alphabetically) of the read word. For example, both declaim and decimal words will be using the key acdeilm.

The desired output of the program is located in the part-r-00000 file, while the code file is located in the Anagram.java file. There are more than enough comments which explain the whole implementation very analytically.

Helpful Material-Links

If you are not very familiar with the Hadoop Map-Reduce framework, the following sites provide useful information for understanding some basic concepts, as well as some of the ideas of this task:

Fundamentals of MapReduce with MapReduce Example

Creating Custom Hadoop Writable Data Type

MSc in Data Science Programme

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Anagram		Anagram
README.md		README.md
hadoop_img.png		hadoop_img.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop-MapReduce-Anagram-Solver

But what is an anagram?

Data

Implementation

Helpful Material-Links

About

Releases

Packages

Languages

nikopetr/Hadoop-MapReduce-Anagram-Solver

Folders and files

Latest commit

History

Repository files navigation

Hadoop-MapReduce-Anagram-Solver

But what is an anagram?

Data

Implementation

Helpful Material-Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages