Watch Data Gathering

Watch Data Gathering is a project that receives a real-time data from users then ingest them to Spark through Kafka. We Use Spark Stream API to process and store data and Spark for later batch processing and then is visualized by a spring boot application (which is still in progress)

Here is the architecture for the project

The project runs with these tools:

JDK - 1.8
Maven - 3.3.9
ZooKeeper - 3.4.8
Kafka (2.11-1.0.2)
HBase - 1.4.8
Spark - 2.7.1 Pre-built for Hadoop 2.7.2

It runs on the following Docker Image which already contains Hadoop, Spark and HBase

Watch Producer is a Maven Composed Project that contains three projects:

Watch Producer
Watch Processor
Watch Batch Processor

JDK 1.8 and Maven are required to build this project. Docker is also needed to download the image and build the containers.

1- Building the containers

And then to create the docker container, use the following command

  # Downloading the image
  docker pull liliasfaxi/spark-hadoop
  
  # Creating the network
  docker network create --driver=bridge hadoop
  
  # Running the containers
  docker run -itd --net=hadoop -p 50070:50070 -p 8088:8088 -p 7077:7077 -p 16010:16010 --name hadoop-master --hostname hadoop-master liliasfaxi/spark-hadoop:hv-2.7.2
  docker run -itd -p 8040:8042 --net=hadoop --name hadoop-slave1 --hostname hadoop-slave1 liliasfaxi/spark-hadoop:hv-2.7.2
  docker run -itd -p 8041:8042 --net=hadoop --name hadoop-slave2 --hostname hadoop-slave2 liliasfaxi/spark-hadoop:hv-2.7.2

Note

To start the containers later on, use this command:

    docker start hadoop-master
    docker start hadoop-slave1
    docker start hadoop-slave2

2- Setting up the container

Now we need to set up the container by running Kafka, Zookeeper, HBase and Hadoop

First of all, run this command to open bash in the master container

docker exec -it hadoop-master bash

Now, we need to run these commands one by one to make sure it works!

./start-hadoop.sh
./start-kafka-zookeeper.sh	
cp -r $HBASE_HOME/lib/* $SPARK_HOME/jars
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic watch-data-event
start-hbase.sh

To verify if each process is running, just run this command

jps

It should give the following result

Now to set up the HBase, we need to create the needed tables.
To start Hbase, run:

hbase shell

and then run

create 'view','watchDetails','watchedEpisode','userDetails'
create 'total_traffic','data'
create 'window_traffic','data'
create 'batch','data'

View: Stores every view details
Total_traffic: Stores the number of watchers per show
UserWatchCounter: Stores the numeber of watcher per show in a windowed duration
Batch: Calculates the number of views per show

3- Building the project

Now we need to build the project.
Use below command to build all projects.

mvn package

And then execute the following commands to copy the jar files into the container

docker cp .\WatchProcessor\target\watch-spark-processor-1.0.0.jar hadoop-master:/root/sparkStreaming.jar
docker cp .\WatchProducer\target\watch-kafka-producer-1.0.0.jar hadoop-master:root/kafka.jar
docker cp .\WatchBatchProcessor\target\watch-spark-batch-processor-1.0.0.jar hadoop-master:/root/sparkBatch.jar

4- Running the project

Now to run the project open three bash in hadoop-master container with the following command

docker exec -it hadoop-master bash

And in each container run the following commands:

1- To start sending data:

java -jar kafka.jar

2- To start the streaming reception:

spark-submit --master yarn --deploy-mode client --class "com.watch.app.spark.processor.WatchDataProcessor" sparkStreaming.jar

3- To start the batch process:

spark-submit --master yarn --deploy-mode client --class "com.watch.app.spark.processor.WatchBatchProcessor" sparkBatch.jar

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
WatchBatchProcessor		WatchBatchProcessor
WatchProcessor		WatchProcessor
WatchProducer		WatchProducer
img		img
.gitignore		.gitignore
pom.xml		pom.xml
readMe.md		readMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Watch Data Gathering

1- Building the containers

Note

2- Setting up the container

3- Building the project

4- Running the project

1- To start sending data:

2- To start the streaming reception:

3- To start the batch process:

That's it

Leave a Star if you liked the project!

About

Releases

Packages

Languages

Kozatoo/Watch-Data-Gathering

Folders and files

Latest commit

History

Repository files navigation

Watch Data Gathering

1- Building the containers

Note

2- Setting up the container

3- Building the project

4- Running the project

1- To start sending data:

2- To start the streaming reception:

3- To start the batch process:

That's it

Leave a Star if you liked the project!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages