Twitter Heat is a clustering application utilizing Akka, Akka Cluster, Akka Stream, Akka HTTP, and Twitter4J to collect Twitter messages of interest.
The main components of Twitter Heat are "Collector", "Tweet Stream Worker" and "API". The interactions among these components are illustrated as below.
Twitter Heat uses SBT as a building tool and for publishing docker images. Twitter Heat is configured so that Collector, Stream Worker, Tweet Stream and Stats Manager are deployed in one docker container while API (Akka HTTP) in another.
Executing the commands in the terminal under the project directory,
sbt clean docker:publishLocal
creates 2 docker images:
- twitter-heat-collector
- twitter-heat-api
To quickly start a cluster of two collectors and one api server, run the command
docker-compose up
You can see from the standard output that 3 nodes are brought up with collector1 as the leader.
A set of keys and access tokens is required to invoked Twitter's APIs. You need to go to https://dev.twitter.com/ to acquire a set of keys and tokens. Twitter Heat Collector looks up environment variables to find the keys and tokens you specified.
- TWITTER_CONSUMER_KEY
- TWITTER_CONSUMER_SECRET
- TWITTER_ACCESS_TOKEN
- TWITTER_ACCESS_TOKEN_SECRET
For Twitter Heat, you should enter your won Twitter keys/tokens in docker-compose.yml.
After starting the Twitter Heat cluster using docker-compose, you can access the API server from port 8080 using any HTTP client.
curl -X POST localhost:8080/query/taiwan
"taiwan" is the keyword that you are interested in. Twitter Heat will create tweet stream gathering messages containing the keyword.
curl -X DELETE localhost:8080/query/taiwan
Stop the tweet steam with the specified keyword. If no tweet streams match the keyword, a 404 HTTP status code is returned.
$ curl -I -X DELETE localhost:8080/query/galaxy
HTTP/1.1 404 Not Found
Server: akka-http/10.0.5
Date: Sat, 29 Apr 2017 08:07:56 GMT
Content-Length: 0
curl -X GET localhost:8080/query
$ curl -X GET localhost:8080/query
["taiwan","japan"]
curl -X GET localhost:8080/stats
$ curl -X GET localhost:8080/stats
{"japan":12094,"taiwan":584}
Twitter imposes rate limiting on its public APIs. Twitter Heat makes a Twitter login with every query request. If you start a query too often, you will see the Twitter rate limiting warnings in the log.
Of course, one Twitter login per query request is a flawed implementation. A login session can be shared by multiple query requests and hence multiple tweet stream.