This project counts tweets that include #GoTS7 hashtag per user in real-time.
Also, username and tweet counts are printed.
- Authentication operations were completed with Tweepy module of Python.
- StreamListener named KafkaPushListener was create for Twitter Streaming. StreamListener produces data for Kafka Consumer.
- Producing data was filtered about including Game of Thrones hashtag.
- SparkContext was created to connect Spark Cluster.
- Kafka Consumer that consumes data from 'twitter' topic was created.
- Calculated how many tweets include #GotS7 hashtag per user and printed usernames and counts in real-time.
- Create Twitter API account and get keys for twitter_config.py
- Start Apache Kafka
./kafka/kafka_2.11-0.11.0.0/bin/kafka-server-start.sh ./kafka/kafka_2.11-0.11.0.0/config/server.properties
- Run kafka_push_listener.py with Python version 3.
PYSPARK_PYTHON=python3 bin/spark-submit kafka_push_listener.py
- Run kafka_twitter_spark_streaming.py with Python version 3.
PYSPARK_PYTHON=python3 bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 kafka_twitter_spark_streaming.py