Built a real-time analytics dashboard to visualize the trending hashtags and @mentions at a given location by using real time streaming twitter API to get data.
- You can refer to following guide to install kafka.
- Spark can be downloaded from following link
https://towardsdatascience.com/running-zookeeper-kafka-on-windows-10-14fc70dcc771
- Create kafka topic.
- You can refer to below link
- Or run following command
- Update conf file with your secret key and access tokens.
- Install Python dependencies.
- Install Node js dependencies.
- Start Zookeeper
- Start Kafka
- Run python file to fetch tweets.
- Run python file to analyze tweets.
- Start npm server
https://dzone.com/articles/running-apache-kafka-on-windows-os
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitter
pip install -r requirements.txt
npm install
Open cmd and execute
zkserver
Go to Kafka installation directory. ..\kafka_2.11-2.3.1\bin\windows. Open cmd here and execute following command.
kafka-server-start.bat C:\ProgramData\Java\kafka_2.11-2.3.1\config\server.properties
python fetch_tweets.py
python analyze_tweets.py
npm start
Area | Technology |
---|---|
Front-End | HTML5, Bootstrap, CSS3, Socket.IO, highcharts.js |
Back-End | Express, Node.js |
Cluster Computing Framework | Apache Spark (python) |
Message Broker | Apache kafka |
- Extract data from Twitter's streaming API and put it into Kakfa topic.
- Spark is listening to this topic, it will read the data from topic, analyze it is using spark streaming and put top 10 trending hashtags and @mentions into another kafka topic.
- Spark Streaming creates DStream whenever it read the data from kafka and analyze it by performing operation like map, filter, updateStateByKey, countByValues and forEachRDD on the RDD and top 10 hashtags and mentions are obtained from RDD using SparkSQL.
- Node.js will pick up the this data from kafka topic on server side and emit it to the socket.
- Socket will push data to user's dashboard which is rendered using highcharts.js in realtime.
- The dashboard is refreshed every 60 secs.