Recovers Kinesis Firehose source record backups from S3, replaying them into the stream.
The python scripts make use of python 3 and boto3. pipenv install
or pip install boto3
will do.
A package.sh
shell script will create a lambdas.zip
that can be uploaded to AWS Lambda. The function handlers are:
queue_firehose_s3_backups.main
recover_firehose_s3_backup.main
The lambda functions will need an IAM role that has permissions to:
- Read from S3
- Put to Kinesis Firehose
- Send and receive messages from SQS
Files are recovered in two steps, with two lambda functions:
queue_firehose_s3_backups.py
lists all the files in the S3 source records backup and SQS queues each filerecover_firehose_s3_backup.py
is triggered by the SQS queue. It parses objects from each file and puts records to the Kinesis Firehose stream
With the lambda functions deployed, they can be tested with an event like the following
{
"bucket": "my-s3-bucket",
"prefix": "is/optional/",
"queue_url": "https://sqs.us-east-1.amazonaws.com/123456789/my_queue.fifo",
"kinesis_stream": "my-kinesis-stream"
}
The file is assumed to be formatted in the idiosyncratic Kinesis Firehose manner, with objects concatenated together.
Thanks to Tom Chapin for the code to parse the Kinesis Firehose S3 record format: https://stackoverflow.com/questions/34468319/reading-the-data-written-to-s3-by-amazon-kinesis-firehose-stream