-
-
Notifications
You must be signed in to change notification settings - Fork 169
Separate Public/Private Subnet Configuration and Enhance NAT Gateway Placement #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Incorporate feature from PR #225 to expose NAT Gateway IDs in the subnet stats outputs. - Add `nat_gateway_id` field to `named_private_subnets_stats_map` (maps to the NAT Gateway that the private subnet routes to for egress) - Add `nat_gateway_id` field to `named_public_subnets_stats_map` (maps to the NAT Gateway in that public subnet, if any) - Create helper locals to correctly map subnets to NAT Gateways - Update output descriptions to reflect the new fourth field This makes the subnet stats more complete and enables downstream components to reference NAT Gateway IDs when needed (e.g., network firewall routing configurations). Implementation correctly handles our new NAT placement features: - Works with index-based NAT placement (`nat_gateway_public_subnet_indices`) - Works with name-based NAT placement (`nat_gateway_public_subnet_names`) - Private subnets correctly map to the NAT they route to (using existing `private_route_table_to_nat_map`) - Public subnets correctly identify which ones contain NAT Gateways 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update deprecated `vpc = true` to `domain = "vpc"` in aws_eip resource. The `vpc` argument was deprecated in AWS Provider v5.0 and removed completely. The new syntax uses `domain = "vpc"` instead. This fixes test failures where Terraform validation fails with: "Error: Unsupported argument - An argument named 'vpc' is not expected here" Fixes the same issue that caused PR #225 tests to fail. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update cloudposse/vpc/aws module from v2.0.0 to v3.0.0 across all example configurations. Changes: - examples/complete/main.tf - examples/existing-ips/main.tf - examples/multiple-subnets-per-az/main.tf - examples/nacls/main.tf - examples/redundant-nat-gateways/main.tf - examples/separate-public-private-subnets/main.tf v3.0.0 includes: - Updates to Terraform AWS provider v6 compatibility - Enhanced VPC endpoint support - Bug fixes and improvements This ensures all examples use the latest stable VPC module version with AWS Provider v6 support. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Enhanced PRD and README with comprehensive documentation of the NAT Gateway ID mapping feature and AWS Provider compatibility updates. Changes: **PRD Updates (docs/prd/separate-public-private-subnets-and-nat-placement.md):** - Updated executive summary to include 4th feature - Added Feature #4: NAT Gateway ID Exposure in Subnet Stats - Detailed problem, solution, and implementation - Example output structures for both private and public subnet stats - Benefits and use cases (network firewall routing) - Added Bug Fix #3: AWS Provider v5+ Compatibility - Documents EIP `vpc = true` → `domain = "vpc"` migration - Notes VPC module updates to v3.0.0 **README.yaml Updates:** - Added new section "NAT Gateway ID References in Outputs" - Detailed explanation of `named_private_subnets_stats_map` structure (4 fields) - Detailed explanation of `named_public_subnets_stats_map` structure (4 fields) - Real-world use case example: network firewall routing configuration - Shows how to extract NAT Gateway IDs from stats maps **README.md (Generated):** - Auto-generated from README.yaml using `atmos readme` - Includes all new documentation sections - Updated output descriptions to reflect 4-field structure The NAT Gateway ID mapping enables downstream components (network firewalls, routing policies) to reference NAT Gateways associated with subnets without manual mapping or additional data sources. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Standardize AWS Provider version constraints across the module and all examples to ensure compatibility with dependencies and features. Changes: **Main Module (versions.tf):** - Updated from `>= 3.71.0` to `>= 5.0` - Rationale: AWS Provider v5 introduced `domain = "vpc"` for EIP resources - Module is compatible with v5+ (main code uses neither `vpc` nor `domain` attributes for broad compatibility) - Documented AWS Provider v5+ compatibility in PRD **All Examples (examples/*/versions.tf):** - Updated from `>= 3.71.0` or `>= 4.0` to `>= 6.0` - Rationale: All examples use cloudposse/vpc/aws v3.0.0 which requires AWS Provider v6 - VPC module v3.0.0 released September 12, 2025 with AWS Provider v6 support - Examples affected: - examples/complete/versions.tf - examples/existing-ips/versions.tf (also uses `domain = "vpc"`) - examples/multiple-subnets-per-az/versions.tf - examples/nacls/versions.tf - examples/redundant-nat-gateways/versions.tf - examples/separate-public-private-subnets/versions.tf **Version Matrix:** - Module core: AWS Provider >= 5.0 (broadest compatibility) - Examples: AWS Provider >= 6.0 (required by VPC module dependency) This ensures all examples work correctly with their dependencies and prevents version conflict errors during terraform init. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update test dependencies to their latest stable releases for improved security, performance, and compatibility. **Go Version:** - Updated from Go 1.24 to Go 1.25 - Updated toolchain from go1.24.0 to go1.25.0 **Major Dependency Updates:** **Testing Framework:** - terratest: v0.41.23 → v0.52.0 - testify: v1.8.2 → v1.11.1 **Kubernetes:** - k8s.io/apimachinery: v0.20.6 → v0.34.0 - k8s.io/api: v0.20.6 → v0.34.0 - k8s.io/client-go: v0.20.6 → v0.34.0 - k8s.io/klog/v2: v2.4.0 → v2.130.1 **AWS SDK v2 (new):** - Migrated to AWS SDK v2 packages - Added support for multiple AWS services (EC2, S3, IAM, RDS, etc.) **HashiCorp:** - go-getter: v1.7.1 → v1.8.3 + v2.2.3 - hcl/v2: v2.9.1 → v2.24.0 - terraform-json: v0.13.0 → v0.27.2 - go-version: v1.6.0 → v1.7.0 **Google Cloud:** - Updated all google.golang.org packages to latest versions - grpc: v1.56.3 → v1.76.0 - protobuf: v1.30.0 → v1.36.10 **golang.org/x packages:** - crypto: v0.1.0 → v0.43.0 - net: v0.9.0 → v0.46.0 - oauth2: v0.7.0 → v0.32.0 - sys: v0.7.0 → v0.37.0 - text: v0.9.0 → v0.30.0 - tools: v0.6.0 → v0.38.0 **Other Notable Updates:** - klauspost/compress: v1.15.11 → v1.18.1 - mattn/go-zglob: v0.0.2 → v0.6 - ulikunitz/xz: v0.5.10 → v0.5.15 - zclconf/go-cty: v1.9.1 → v1.17.0 All dependencies updated using `go get -u ./...` and cleaned with `go mod tidy`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove unused local variable `subnet_az_count` that was causing tflint failure. This variable was a remnant from the refactoring where we separated public and private subnet counts. It was declared but never used anywhere in the code. The variable was calculating `max(local.public_subnet_az_count, local.private_subnet_az_count)` but this value is not needed since we now handle public and private subnets independently using: - `local.public_subnet_az_count` - for public subnet operations - `local.private_subnet_az_count` - for private subnet operations - `local.vpc_az_count` - for NAT gateways (one per AZ) Fixes tflint warning: main.tf:86:3: warning: local.subnet_az_count is declared but not used 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Member
Author
|
/terratest |
…ble to examples Add the nat_gateway_public_subnet_indices variable to both new example configurations to fix test failures where the test was trying to pass this variable but it wasn't declared in the example's variables.tf. **Test Failure Fixed**: TestExamplesSeparatePublicPrivateSubnetsWithIndices was failing with: "Error: Value for undeclared variable - A variable named 'nat_gateway_public_subnet_indices' was assigned on the command line, but the root module does not declare a variable of that name." **Changes**: 1. examples/separate-public-private-subnets/: - Added nat_gateway_public_subnet_indices variable with default [0] - Made nat_gateway_public_subnet_names nullable - Passed nat_gateway_public_subnet_indices to module in main.tf 2. examples/redundant-nat-gateways/: - Added nat_gateway_public_subnet_indices variable with default [0, 1] - Made nat_gateway_public_subnet_names nullable - Passed nat_gateway_public_subnet_indices to module in main.tf Both examples now support passing either nat_gateway_public_subnet_indices OR nat_gateway_public_subnet_names, enabling the test to override with index-based placement when needed. Fixes test at examples_separate_public_private_subnets_test.go:169 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Member
Author
|
/terratest |
Member
Author
|
/terratest |
Member
Author
|
/terratest |
osterman
approved these changes
Nov 2, 2025
Contributor
|
These changes were released in v3.0.0. |
aknysh
added a commit
that referenced
this pull request
Nov 3, 2025
**Problem:** When max_nats < number of AZs, NAT Gateways are only created in the first max_nats AZs, but the route table mapping formulas assumed NATs exist in all AZs, causing "Invalid index" errors. **Example scenario that failed:** - 2 AZs, max_nats=1 → Only 1 NAT created (in AZ0) - 2 private route tables (1 per AZ) - Mapping produced [0, 1] but only NAT[0] exists - Error: aws_nat_gateway.default[1] - invalid index **Root cause:** The private_route_table_to_nat_map and public_route_table_to_nat_map formulas calculated: az_index * nats_per_az + subnet_within_az This works when NATs exist in all AZs, but fails when max_nats limits NATs to fewer AZs. The formula could generate indices >= nat_count. **Fix:** Added modulo operation to clamp results to available NAT indices: (floor(i / subnets_per_az_count) * nats_per_az + (i % subnets_per_az_count) % nats_per_az) % nat_count Now route tables in AZs without NATs will wrap around to use NATs from other AZs (typically AZ0). **Why wasn't this caught by tests?** None of the module's examples use the max_nats feature. All examples either omit max_nats (defaulting to unlimited) or use max_nats >= number of AZs, so the bug was never triggered. **Testing:** This bug was discovered by the aws-vpc component test suite when using: - 2 AZs with max_nats: 1 - public_subnets_enabled: true - NAT Gateway enabled Fixes: #226
aknysh
added a commit
that referenced
this pull request
Nov 4, 2025
* Fix NAT routing when max_nats limits NATs to fewer AZs **Problem:** When max_nats < number of AZs, NAT Gateways are only created in the first max_nats AZs, but the route table mapping formulas assumed NATs exist in all AZs, causing "Invalid index" errors. **Example scenario that failed:** - 2 AZs, max_nats=1 → Only 1 NAT created (in AZ0) - 2 private route tables (1 per AZ) - Mapping produced [0, 1] but only NAT[0] exists - Error: aws_nat_gateway.default[1] - invalid index **Root cause:** The private_route_table_to_nat_map and public_route_table_to_nat_map formulas calculated: az_index * nats_per_az + subnet_within_az This works when NATs exist in all AZs, but fails when max_nats limits NATs to fewer AZs. The formula could generate indices >= nat_count. **Fix:** Added modulo operation to clamp results to available NAT indices: (floor(i / subnets_per_az_count) * nats_per_az + (i % subnets_per_az_count) % nats_per_az) % nat_count Now route tables in AZs without NATs will wrap around to use NATs from other AZs (typically AZ0). **Why wasn't this caught by tests?** None of the module's examples use the max_nats feature. All examples either omit max_nats (defaulting to unlimited) or use max_nats >= number of AZs, so the bug was never triggered. **Testing:** This bug was discovered by the aws-vpc component test suite when using: - 2 AZs with max_nats: 1 - public_subnets_enabled: true - NAT Gateway enabled Fixes: #226 * Add test coverage for max_nats feature **New Example: limited-nat-gateways** Tests the critical max_nats < num_azs scenario that was previously untested and contained the bug fixed in commit 3681299. **What it tests:** - 3 AZs with max_nats=1 (only 1 NAT Gateway created) - 3 AZs with max_nats=2 (only 2 NAT Gateways created) - Verifies that private subnets in AZs without NATs can still route to the internet through NATs in other AZs - Ensures route table mapping correctly wraps around when NATs are limited to fewer AZs **Why this is important:** This test would have caught the bug before it reached production. The bug caused "Invalid index" errors when max_nats < num_azs because route tables in AZs without NATs tried to reference non-existent NAT Gateway indices. **Use case documented:** The example includes comprehensive README explaining when to use max_nats for cost optimization: - Dev/test environments: 1 NAT for 3 AZs saves ~$65/month (67%) - Trade-off: Lower availability vs. cost savings - When to use and when to avoid this pattern **Test coverage:** - TestExamplesLimitedNatGateways: 3 AZs, 1 NAT - TestExamplesLimitedNatGatewaysTwoNats: 3 AZs, 2 NATs - TestExamplesLimitedNatGatewaysDisabled: enabled=false check This brings total test coverage to: - 7 examples (was 6) - 11 test functions (was 8) - ✅ max_nats feature now tested Closes test coverage gap identified during bug investigation. * Add comprehensive documentation for max_nats bug fix **Documentation Added:** 1. **Test Coverage Analysis** (docs/test-coverage-analysis.md) - Complete analysis of all 7 examples and 13 test functions - Feature coverage matrix showing what's tested vs untested - Identified critical gaps: NAT Instance, IPv6/NAT64, custom route tables - Recommendations for future test improvements - Test execution strategy and best practices 2. **PRD: Fix max_nats Routing Bug** (docs/prd/fix-max-nats-routing.md) - Executive summary of the critical bug - Detailed problem statement with error examples - Root cause analysis and why tests didn't catch it - Technical solution design with before/after code - Complete testing strategy with 3 new test functions - Cost analysis showing 67% savings with max_nats - Rollout plan, success metrics, and risk assessment - Lessons learned and process improvements **Key Findings:** - Bug existed because max_nats had ZERO test coverage - Module has good test coverage overall (13 tests, 7 examples) - Several features still lack tests: NAT Instance, IPv6, custom routes - New limited-nat-gateways example fills critical gap - Cost optimization: $65/month savings per environment with max_nats=1 **Impact:** This documentation ensures: - Future contributors understand test requirements - Test gaps are visible and prioritized - Cost optimization features are documented - Bug investigation process is recorded - Test coverage improvements have a roadmap * Enhance PRD with NAT placement diagrams and decision tree **Major Additions:** 1. **NAT Gateway Placement Behavior Section** - Explains the two configuration dimensions: subnet names and max_nats - Formula: Total NATs = min(num_azs, max_nats) × num_subnet_names - Clarifies that multiple subnet names = multiple NATs per AZ 2. **Visual ASCII Diagrams for 4 Placement Strategies** - Strategy 1: Standard (1 NAT per AZ) - $97.20/month - Strategy 2: Redundant (2 NATs per AZ) - $194.40/month - Strategy 3: Limited (1 NAT total with max_nats) - $32.40/month - Strategy 4: Hybrid (2 NATs in 1 AZ) - $64.80/month Each diagram shows: - Which AZs get NATs - Which subnet types get NATs - Routing arrows for cross-AZ traffic - Monthly cost calculations 3. **Configuration Calculation Table** - 6 common configuration scenarios - Shows AZs, subnet names, max_nats, total NATs, cost, and use case - Helps users quickly find the right configuration 4. **Decision Tree for Choosing NAT Configuration** - Production vs non-production paths - Maximum availability vs cost optimization - Special requirements (failover, single AZ, no NAT) - Recommendations with cost and availability trade-offs 5. **Routing Behavior Explanation** - Mathematical formula breakdown with examples - Shows how modulo operation prevents invalid indices - Demonstrates the bug fix in action 6. **Best Practices by Environment Type** - Production: NAT per AZ minimum, consider redundancy - Development: max_nats=1 for cost savings - Staging: Balanced approach with max_nats=2 - Cost optimization tips **Why This Matters:** The original PRD didn't clearly explain that: - nat_gateway_public_subnet_names can create MULTIPLE NATs per AZ - max_nats limits which AZs get NATs, not total NAT count - These two variables multiply together This caused confusion about cost implications. The new diagrams and decision tree make it crystal clear how to configure NATs for different scenarios and understand the cost/availability trade-offs. **Examples:** - 3 AZs + names=["lb","web"] + max_nats=3 = 6 NATs ($194/mo) - 3 AZs + names=["lb"] + max_nats=1 = 1 NAT ($32/mo) - Understanding this prevents costly misconfigurations * updates * Fix test variable override issue for max_nats Remove hardcoded `max_nats = 1` from fixtures.us-east-2.tfvars to allow tests to properly override the value via the Vars parameter. **Problem:** TestExamplesLimitedNatGatewaysTwoNats was failing because it tried to override max_nats=2 via Vars, but the fixtures file had max_nats=1 hardcoded. Depending on Terraform's variable precedence and how terratest constructs the command line, the -var-file value may take precedence over the -var flag. **Solution:** Remove max_nats from fixtures file. Both tests explicitly pass max_nats in their Vars map: - TestExamplesLimitedNatGateways passes max_nats: 1 - TestExamplesLimitedNatGatewaysTwoNats passes max_nats: 2 The example's variables.tf provides default = 1 as a fallback. Fixes the test failure where expected 2 NAT Gateways but got 1. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * updates --------- Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
what
why
User Pain Points:
Business Impact:
Key Features:
Separate Public/Private Subnet Counts: New variables
public_subnets_per_az_count,public_subnets_per_az_names,private_subnets_per_az_count,private_subnets_per_az_namesallow independent control while falling back to original variables for backward compatibilityControlled NAT Placement by Index: Variable
nat_gateway_public_subnet_indices(default[0]) specifies which subnet position(s) in each AZ receive NAT Gateways, enabling cost optimizationNamed NAT Placement: Variable
nat_gateway_public_subnet_namesallows intuitive placement like["loadbalancer"]instead of remembering indicesBug Fixes: Corrected NAT Gateway global index calculation and route table mapping to ensure proper AZ distribution and same-AZ routing
Examples Included:
examples/separate-public-private-subnets/: Cost-optimized with 1 NAT per AZ (~$110/month)examples/redundant-nat-gateways/: High-availability with 2 NATs per AZ (~$140/month)Test Coverage:
references
docs/prd/separate-public-private-subnets-and-nat-placement.md