📺 Youtube Analysis 📽️
*For problem statement 1.
We analyze the data to identify the top 5 categories in which the most number of videos are uploaded. The dataset is gathered using the YouTube API and stored in Hadoop Distributed File System(HDFS). MapReduce algorithm is applied to process the dataset and identify the video categories.
Open terminal 🖥️
hadoop fs -mkdir /youtube
hadoop fs -put /home/mangal/Desktop/youtubedata.txt /youtube
Open bashrc file by gedit ~/.bashrc and type ///// don't add this path if it is already exist export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
Then source ~/.bashrc
cd Desktop 📀
Now compile youtube1.java file suppose that file is on Desktop
hadoop com.sun.tools.javac.Main youtube1.java
To combine all class files
jar cf wc.jar youtube1*.class
To execute
hadoop jar wc.jar youtube1 /youtube/youtubedata.txt /top5rating
hadoop fs -ls / //// to check output folder is there or not
hadoop fs -cat /top5rating/part-r-00000
For problem statement 2****
Open terminal
hadoop fs -mkdir /youtube
hadoop fs -put /home/mangal/Desktop/youtubedata.txt /youtube
Open bashrc file by gedit ~/.bashrc and type ///// don't add this path if it is already exist export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
Then source ~/.bashrc
cd Desktop
Now compile youtube2.java file suppose that file is on Desktop
hadoop com.sun.tools.javac.Main youtube2.java
To combine all class files
jar cf wc.jar youtube2*.class
To execute 💽
hadoop jar wc.jar youtube2 /youtube/youtubedata.txt /videorating
hadoop fs -ls / //// to check output folder is there or not
hadoop fs -cat /videorating/part-r-00000
Hadoop Hadoop is a distributed computing Framework developed and maintained by The Apache Software Foundation written in Java. Hadoop consists of HDFS and MapReduce and is genrally deployed in a group of machines called cluster. Initially, GFS and MapReduce were built to empower Google Search. HDFS stands for Hadoop Distributed File System and is used to store data across multiple disks.MapReduce is a way to parallelize Data processing tasks.
MapReduce Algorithm MapReduce Algorithm consists of Map() procedure that performs filtering and sorting of input data and Reduce() performs summary\aggregate function per (key, value) pair.
This project implements Hadoop MapReduce algorithm on the YouTube data and display the result on a web server.