Building near real-time discovery platform
with AWS Lambda, Amazon Kinesis Firehose and Elasticsearch
This is the code repository for code sample used in AWS Big data blog Building a Near-Real-Time Discovery Platform with AWS
- Amazon Web Services account
- Elasticsearch cluster
- AWS Command Line Interface (CLI)
- Node.js installed
- Twitter application with consumer key (API Key), consumer secret (API Secret), access token, and access token secret
AWS Lambda function (lambda-s3-twitter-to-es-python/lambda_function.py) that is triggered once a new file is created on S3.
The function does the following:
- Reads the file content
- Parses the content to JSON format (Elasticsearch stores documents in JSON format)
- Analyzes Twitter data (lambda-s3-twitter-to-es-python/tweet_utils.py):
a. Extracts Twitter mentions (@username) from the tweet text.
b. Extracts sentiment based on emoticons. If there’s no emoticon in the text the function uses textblob sentiment analysis - Loads the data to Elasticsearch (twitter_to_es.py) using elasticsearch-py library
You can download and unzip the deployment package (packages/s3_twitter_to_es.zip), which already includes elasticsearch and textblob modules, or create a deployment package yourself.
Please replace <<PARAMETER>>
with your values.
Modify s3-twitter-to-es-python/config.py by assigning es_host and es_port
Zip your deployment folder
cd path/to/s3-twitter-to-es-python
zip -r -9 ../s3-twitter-to-es-python.zip .
create IAM Role and name it <<LAMBDA_EXEC_ROLE>>
create aws lambda function
aws lambda create-function \
--region <<REGION>> \
--function-name s3-twitter-to-es-python \
--zip-file fileb://path/to/s3_twitter_to_es.zip \
--role arn:aws:iam::<<ACCOUNT_ID>>:role/<<LAMBDA_EXEC_ROLE>> \
--handler lambda_function.handler \
--runtime python2.7 \
--timeout 120
Add S3 as the event source to the lambda function with your <<S3_BUCKET>>
and <<S3_PREFIX>>
Create Firehose IAM Role named “firehose_delivery_role” based on the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<<S3_BUCKET>>
"
]
}
]
}
setup nodejs application
cd firehose-twitter-streaming-nodejs
npm install
modify configurations in config.js
• firehose
o DeliveryStreamName – name your stream. The app will create the delivery stream if it does not exist
o BucketARN: Use <<S3_BUCKET>>
that you entered as the event source for the lambda function
o RoleARN: Use the Firehose role you created earlier (“firehose_delivery_role”)
o Prefix: Use <<S3_PREFIX>>
that you entered as the event source for the lambda function
• twitter – enter your Twitter application keys.
• region – your firehose region (e.g.: us-east-1, us-west-2, eu-west-1)
run the application
node twitter_stream_producer_app
Please see Building a Near-Real-Time Discovery Platform with AWS for more details on using Elasticsearch and Kibana as the discovery platform.