A lightweight system to automatically scale Kinesis Data Streams up and down based on throughput.
- Step 1: Metrics flow from the
Kinesis Data Stream(s)
intoCloudWatch Metrics
(Bytes/Sec, Records/Sec) - Step 2: Two alarms,
Scale Up
andScale Down
, evaluate those metrics and decide when to scale - Step 3: When a scaling alarm triggers it sends a message to the
Scaling SNS Topic
- Step 4: The
Scaling Lambda
processes that SNS message and…- Scales the
Kinesis Data Stream
up or down using UpdateShardCount- Scale Up events double the number of shards in the stream
- Scale Down events halve the number of shards in the stream
- Updates the metric math on the
Scale Up
andScale Down
alarms to reflect the new shard count.
- Scales the
- Designed for simplicity and a minimal service footprint.
- Proven. This system has been battle tested, scaling thousands of production streams without issue.
- Suitable for scaling massive amounts of streams. Each additional stream requires only 2 CloudWatch alarms.
- Operations friendly. Everything is viewable/editable/debuggable in the console, no need to drop into the CLI to see what's going on.
- Takes into account both ingress metrics
Records Per Second
andBytes Per Second
when deciding to scale a stream up or down. - Can optionally take into account egress needs via
Max Iterator Age
so streams that are N minutes behind (configurable) do not scale down and lose much needed Lambda processing power (Lambdas per Shard) because their shard count was reduced due to a drop in incoming traffic. - Already designed out the box to work within the 10 UpdateShardCount per rolling 24 hour limit.
- Emits a custom CloudWatch error metric if scaling fails, you can alarm off this for added peace of mind.
- Can optionally adjust reserved concurrency for your Lambda consumers as it scales their streams up and down.
Changes to the Kinesis stream shard count, as well as the scale up and scale down CloudWatch alarms are ignored by Terraform. This is to allow the lambda to fully manage the shard count of the Kinesis stream which is also referenced in the alarms.
Only the initial values upon first applying the module will be used to configure the scaling alarms and kinesis stream.
In order to set new values for kinesis scaling (e.g. min_shard_count
, kinesis_scale_up_threshold
, kinesis_scale_down_datapoints_required
, etc.), the scale up and scale down alarms must be tainted or deleted:
module.kinesis_scaling.aws_cloudwatch_metric_alarm.kinesis_scale_up
module.kinesis_scaling.aws_cloudwatch_metric_alarm.kinesis_scale_down
If shard_count
needs to be be updated manually through TF, the above alarms and also the kinesis stream itself must be tainted or deleted:
module.kinesis_scaling.aws_kinesis_stream.autoscaling_kinesis_stream
Name | Description | Type | Default | Required |
---|---|---|---|---|
enable_slack_notification | Enable Slack Notification | bool |
false |
no |
encryption_type | Encryption Type | string |
KMS |
no |
kinesis_cooldown_mins | Cooling down Period in minutes | number |
10 |
no |
kinesis_scale_down_datapoints_required | Number of datapoints required in the evaluationPeriod to trigger the alarm to scale down | number |
285 |
no |
kinesis_scale_down_evaluation_period | Period after which the data for the alarm will be evaluated to scale down | number |
300 |
no |
kinesis_scale_down_min_iter_age_mins | To compare with streams max iterator age. If the streams max iterator age is above this, then the stream will not scale down | number |
30 |
no |
kinesis_scale_down_threshold | Scale down threshold | number |
0.25 |
no |
kinesis_scale_up_datapoints_required | Number of datapoints required in the evaluationPeriod to trigger the alarm to scale up | number |
25 |
no |
kinesis_scale_up_evaluation_period | Period after which the data for the alarm will be evaluated to scale up | number |
25 |
no |
kinesis_scale_up_threshold | Scale up threshold | number |
0.75 |
no |
kinesis_scaling_period_mins | Scaling Period in minute | number |
5 |
no |
kms_key_id | KMS Key | string |
n/a | yes |
min_shard_count | Minimum Number of Shards greater than zero | number |
5 |
yes |
shard_count | Number of Shards | number |
1 |
no |
slack_web_hook_url | Slack web hook URL | string |
n/a | yes |
stream_name | Stream Name | string |
n/a | yes |
stream_retention_period | Stream Retention Period | number |
24 |
no |
tags | Map of tags that should be applied to all resources | map(string) |
n/a | yes |
Name | Description |
---|---|
kinesis_stream_arn | Output variable definitions |
To generate traffic on your streams you can use Kinesis Data Generator.
Simply edit the scale.go
file as needed and run ./build
to generate a main file suitable for Lambda deployment. Go 1.15.x is recommended.
(https://github.com/aws-samples/kinesis-auto-scaling/tree/main/terraform)