Execution: We have implemented the post processing batch analysis method, where we read the data from file and plot it.
Steps to run part A
Files to run: twitter_app.py, spark_app.py, visual.py
- run
docker run -it -v $PWD:/app --name twitter -w /app python bash - inside the docker container, run
pip install tweepy - run
python twitter_app.py - run another docker container and link it to the twitter container using
docker run -it -v $PWD:/app --link twitter:twitter eecsyorku/eecs4415 - run
spark-submit spark_app.py - on your localhost, run
python install matplotlib - run
python visual.py
Make sure the filese in steps 3 and 4 are running when you run the file in step 6 to udpate the data dynamically
Steps to run part B
Files to run: b_twitter_stream.py, b_spark_stream.py, sentiment_visual.py
- run
docker run -it -v $PWD:/app --name twitter -w /app python bash - inside the docker container, run
pip install tweepy - run
python b_twitter_stream.py - run another docker container and link it to the twitter container using
docker run -it -v $PWD:/app --link twitter:twitter eecsyorku/eecs4415 - run
spark-submit spark_app.py - on your localhost, run
python install matplotlibif you dont have matplotlib already installed - run
python sentiment_visual.py
Make sure the filese in steps 3 and 5 are running when you run the file in step 6 to udpate the data dynamically
Output files added:
- output_graph.txt (realtime data)
- part_A_output_data.txt (shows data filtered through spark streaming with time stamps)
- sentiment_output_graph.txt (realtime data)
- sentiment_output_graph.txt (shows data filtered through spark streaming with time stamps)

