This project is a Scala application which uses Alpakka Cassandra 2.0, Akka Streams and Twitter4S (Scala Twitter Client) to pull new Tweets from Twitter for a given hashtag (or set of hashtags) using Twitter API v1.1 and write them into a local Cassandra database.
NOTE: The project will only save tweets which are not a retweet of another tweet and currently only saves the truncated version of tweets (<=140 chars).
- Scala 2.12+
- JDK 8
- sbt (this project uses 1.4.9)
- Docker (and required RAM for running a Cassandra container)
- Setup and run local Cassandra using Docker
- Configure Twitter API keys
- Setup hashtags and run the project using SBT
- Observe results in Cassandra using cqlsh
1.1 - Make sure you have docker installed on your machine. Run the following docker command to pull up a local Cassandra container with port 9042 exposed:
docker run -p 9042:9042 --rm --name my-cassandra -d cassandra
1.2 - Make sure your container is running (may need to give the container a few minutes to boot up):
docker ps -a
The above output shows that the container has been running for 3 minutes, and also shows that port 9042 locally is bound to port 9042 in the container. (default port for Cassandra)
1.3 - Afterwards, run CQLSH on the container in interactive terminal mode to setup keyspace and tables:
docker exec -it my-cassandra cqlsh
CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE table testkeyspace.testtable(id bigint PRIMARY KEY, excerpt text);
INSERT INTO testkeyspace.testtable(id, excerpt)
VALUES (37, 'appletest');
exit
2.1 - From the root folder of this repository, browse to the application.conf.example
file found in /src/main/resources/application.conf.example
. Copy this file into this same directory and rename it application.conf
mv /src/main/resources/application.conf.example /src/main/resources/application.conf
2.2 - Go to the twitter developer dashboard website, register an application and insert these four twitter api keys into this portion of application.conf
:
twitter {
consumer {
key = "consumer-key-here"
secret = "consumer-secret-here"
}
access {
key = "access-key-here"
secret = "access-token-here"
}
}
3.1 - Navigate to /src/main/scala/com/alptwitter/AlpakkaTwitter.scala
and change the following line to indicate what hashtags you wish to look at new tweets for val trackedWords = Seq("#myHashtag")
:
vim /workspace/example-cassandra-alpakka-twitter/src/main/scala/com/alptwitter/AlpakkaTwitter.scala
If you want to track more than one hashtag, add more by adding more strings and separating with commas.
sbt run
As new tweets are posted which contain any of the hashtags listed in the trackedWords variable, a message will print in the console which says whether the tweet was a retweet or a unique tweet.
4.1 - As new tweets (not retweets of tweets) with your entered hashtags are posted and found, they will be saved to Cassandra as a (tweet id, text of tweet) entry in testkeyspace.testtable. To check that the tweets are being saved to Cassandra, run CQLSH on the cassandra container and observe the table:
docker exec -it my-cassandra cqlsh
SELECT * FROM testkeyspace.testtable;