A PoC implementation of the λ-Architecture for collecting and analysing tweets
- Apache Spark
- Apache Cassandra
- Apache Kafka
- Twitter4j
- SBT
- Scala
It is recommended to use Homebrew for installing the required modules:
To install Homebrew, paste the following into your terminal:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- Install Java 8
brew tap caskroom/versions
brew cask install java8
- Install Kafka
brew install kafka
- Install Cassandra
brew install cassandra
- Install SBT
brew install sbt
- Install Scala
brew install scala
Make sure you get the keys for accessing the twitter stream. You need to create an application here: [https://apps.twitter.com]
Once you have created the application you need to put the following keys in src/main/resources/application.conf:
- Consumer Key
- Consumer Secret
- Access Token
- Access Token Secret
Before running our application, we need to start the servers for kafka and cassandra
brew services start zookeeper
brew services start kafka
brew services start cassandra
No we can compile and run our application
sbt clean compile
sbt run
In order to get the application working correctly, you need to follow a specific order when starting the modules:
- Start [3] de.htwsaar.producer.TwitterStreamFetcher (Fetch tweets from twitter streaming api)
- Start [2] de.htwsaar.consumer.CassandraKafkaConsumer (Write them to master dataset)
- Start [1] de.htwsaar.batch.BatchProcessor (Start the batch processor)
- Start [5] de.htwsaar.speed.SparkStreamingKafkaConsumer (Start the real-time processor)
- Start [4] de.htwsaar.serving.HttpServer (Start the HTTP Server as a serving layer)
You can access the results by typing in: "http://localhost:9000/get/hashtags?timestamp=0&limit=20"