Skip to content

MachineDotDev/nat-zero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nat-zero

Go Tests Docs

Scale-to-zero NAT instances for AWS. Stop paying for NAT when nothing is running.

nat-zero is a Terraform module that replaces always-on NAT with on-demand NAT instances. When a workload launches in a private subnet, a NAT instance starts automatically. When the last workload stops, the NAT shuts down and its Elastic IP is released. Idle cost: ~$0.80/month per AZ.

Built on fck-nat AMIs. Orchestrated by a single Go Lambda (~55 ms cold start, 29 MB memory). Integration-tested against real AWS infrastructure on every PR.

   AZ-A (active)               AZ-B (idle)
  ┌──────────────────┐       ┌──────────────────┐
  │ Workloads        │       │ No workloads     │
  │   ↓ route table  │       │ No NAT instance  │
  │ Private ENI      │       │ No EIP           │
  │   ↓              │       │                  │
  │ NAT Instance     │       │ Cost: ~$0.80/mo  │
  │   ↓              │       │ (EBS only)       │
  │ Public ENI + EIP │       │                  │
  │   ↓              │       └──────────────────┘
  │ Internet Gateway │
  └──────────────────┘
           ▲
  EventBridge → Lambda (reconciler, concurrency=1)

Why nat-zero?

State nat-zero fck-nat NAT Gateway
Idle (no workloads) ~$0.80/mo ~$7-8 ~$36+
Active (workloads running) ~$7-8 ~$7-8 ~$36+

AWS NAT Gateway costs ~$36/month per AZ even when idle. fck-nat brings that to ~$7-8/month, but the instance and EIP run 24/7. nat-zero releases the Elastic IP when idle, avoiding the $3.60/month public IPv4 charge.

Best for dev/staging environments, CI/CD runners, batch jobs, and side projects where workloads run intermittently.

How it works

An EventBridge rule captures EC2 instance state changes. A Lambda function (concurrency=1, single writer) runs a reconciliation loop on each event:

  1. Observe — query workloads, NAT instances, and EIPs in the AZ
  2. Decide — compare actual state to desired state
  3. Act — take at most one mutating action, then return

The event is just a trigger — the reconciler always computes the correct action from current state. With reserved_concurrent_executions=1, events are processed sequentially, eliminating race conditions.

Workloads? NAT State Action
Yes None / terminated Create NAT
Yes Stopped Start NAT
Yes Stopping Wait
Yes Running, no EIP Attach EIP
No Running / pending Stop NAT
No Stopped, has EIP Release EIP
Multiple NATs Terminate duplicates

Each NAT uses two persistent ENIs (public + private) created by Terraform. They survive stop/start cycles, keeping route tables intact.

See Architecture for the full reconciliation model and event flow diagrams.

Quick start

module "nat_zero" {
  source = "github.com/MachineDotDev/nat-zero"

  name               = "my-nat"
  vpc_id             = module.vpc.vpc_id
  availability_zones = ["us-east-1a", "us-east-1b"]
  public_subnets     = module.vpc.public_subnets
  private_subnets    = module.vpc.private_subnets

  private_route_table_ids     = module.vpc.private_route_table_ids
  private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks

  tags = { Environment = "dev" }
}

See Examples for spot instances, custom AMIs, and building from source.

Performance

Scenario Time to connectivity
First workload (cold create) ~10.7 s
Restart from stopped ~8.5 s
NAT already running Instant

The Lambda is a compiled Go ARM64 binary. Cold start: 55 ms. Typical invocation: 400-600 ms. Peak memory: 29 MB. The startup delay is dominated by EC2 instance boot, not the Lambda.

See Performance for detailed timings and cost breakdowns.

Notes

  • EventBridge scope: Captures all EC2 state changes in the account; Lambda filters by VPC ID.
  • Startup delay: First workload in an idle AZ waits ~10 seconds for internet. Design scripts to retry outbound connections.
  • Dual ENI: Persistent public + private ENIs survive stop/start cycles.
  • DLQ: Failed Lambda invocations go to an SQS dead letter queue.
  • Clean destroy: A cleanup action terminates NAT instances before terraform destroy removes ENIs.
  • Config versioning: Changing AMI or instance type auto-replaces NAT instances on next workload event.
  • EC2 events only: Currently nat-zero responds only to EC2 instance state changes. If you have a use case for other event sources (ECS tasks, Lambda, etc.), PRs are welcome.

Requirements

Name Version
terraform >= 1.3
aws >= 5.0
null >= 3.0
time >= 0.9

Providers

Name Version
aws >= 5.0
null >= 3.0
time >= 0.9

Modules

No modules.

Resources

Name Type
aws_cloudwatch_event_rule.ec2_state_change resource
aws_cloudwatch_event_target.state_change_lambda_target resource
aws_cloudwatch_log_group.nat_zero_logs resource
aws_iam_instance_profile.nat_instance_profile resource
aws_iam_role.lambda_iam_role resource
aws_iam_role.nat_instance_role resource
aws_iam_role_policy.lambda_iam_policy resource
aws_iam_role_policy_attachment.ssm_policy_attachment resource
aws_lambda_function.nat_zero resource
aws_lambda_function_event_invoke_config.nat_zero_invoke_config resource
aws_lambda_invocation.cleanup resource
aws_lambda_permission.allow_ec2_state_change_eventbridge resource
aws_launch_template.nat_launch_template resource
aws_network_interface.nat_private_network_interface resource
aws_network_interface.nat_public_network_interface resource
aws_route.nat_route resource
aws_security_group.nat_security_group resource
null_resource.build_lambda resource
null_resource.download_lambda resource
time_sleep.eventbridge_propagation resource
time_sleep.lambda_ready resource

Inputs

Name Description Type Default Required
ami_id Explicit AMI ID to use (overrides AMI lookup entirely) string null no
availability_zones List of availability zones to deploy NAT instances in list(string) n/a yes
block_device_size Size in GB of the root EBS volume number 10 no
build_lambda_locally Build the Lambda binary from Go source instead of downloading a pre-compiled release. Requires Go and zip installed locally. bool false no
custom_ami_name_pattern AMI name pattern when use_fck_nat_ami is false string null no
custom_ami_owner AMI owner account ID when use_fck_nat_ami is false string null no
enable_logging Create a CloudWatch log group for the Lambda function bool true no
ignore_tag_key Tag key used to mark instances the Lambda should ignore string "nat-zero:ignore" no
ignore_tag_value Tag value used to mark instances the Lambda should ignore string "true" no
instance_type Instance type for the NAT instance string "t4g.nano" no
lambda_binary_url URL to the pre-compiled Go Lambda zip. Updated automatically by CI. string "https://github.com/MachineDotDev/nat-zero/releases/download/nat-zero-lambda-latest/lambda.zip" no
lambda_memory_size Memory allocated to the Lambda function in MB (also scales CPU proportionally) number 128 no
log_retention_days CloudWatch log retention in days (only used when enable_logging is true) number 14 no
market_type Whether to use spot or on-demand instances string "on-demand" no
name Name prefix for all resources created by this module string n/a yes
nat_tag_key Tag key used to identify NAT instances string "nat-zero:managed" no
nat_tag_value Tag value used to identify NAT instances string "true" no
private_route_table_ids Route table IDs for the private subnets (one per AZ) list(string) n/a yes
private_subnets Private subnet IDs (one per AZ) for NAT instance private ENIs list(string) n/a yes
private_subnets_cidr_blocks CIDR blocks for the private subnets (one per AZ, used in security group rules) list(string) n/a yes
public_subnets Public subnet IDs (one per AZ) for NAT instance public ENIs list(string) n/a yes
tags Additional tags to apply to all resources map(string) {} no
use_fck_nat_ami Use the public fck-nat AMI. Set to false to use a custom AMI. bool true no
vpc_id The VPC ID where NAT instances will be deployed string n/a yes

Outputs

Name Description
eventbridge_rule_arn ARN of the EventBridge rule capturing EC2 state changes
lambda_function_arn ARN of the nat-zero Lambda function
lambda_function_name Name of the nat-zero Lambda function
launch_template_ids Launch template IDs for NAT instances (one per AZ)
nat_private_eni_ids Private ENI IDs for NAT instances (one per AZ)
nat_public_eni_ids Public ENI IDs for NAT instances (one per AZ)
nat_security_group_ids Security group IDs for NAT instances (one per AZ)

Contributing

Contributions welcome. Please open an issue or submit a pull request.

License

MIT