Processing live data streams using Spark’s streaming APIs and Kafka and performing a basic sentiment analysis of realtime tweets.
- Run
sudo pip install - r requirements . txt
. - Download and extract the latest binary from https://kafka.apache.org/downloads.html.
- Start zookeeper service
bin/zookeeper-server-start.sh config/zookeeper.properties
- Start kafka service
bin/kafka-server-start.sh config/server.properties
- Create a topic named twitterstream in kafka
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitterstream
- Start Downloading tweets from the twitter stream API and push them to the twitterstream topic in Kafka
python twitter_to_kafka.py
- Run the Stream Analysis Program
$SPARK_HOME/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.5.1 twitterStream.py