Skip to content

Commit dd3b728

Browse files
committed
fix: Preserve ECR images during cleanup and increase deployment timeout
- Update pre-destroy-cleanup.sh to NOT delete ECR repositories - Add prominent warning message with manual ECR deletion commands - Add Secrets Manager cleanup to pre-destroy script - Update README with "Destroying Resources" section and ECR cleanup docs - Increase post-deployment max_attempts from 30 to 40 for slower deployments
1 parent 03d31f8 commit dd3b728

File tree

3 files changed

+148
-23
lines changed

3 files changed

+148
-23
lines changed

terraform/aws-ecs/README.md

Lines changed: 103 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -881,10 +881,110 @@ curl https://kc.us-east-1.YOUR.DOMAIN/health
881881
# ============================================================================
882882
# CLEANUP
883883
# ============================================================================
884-
# Destroy infrastructure (WARNING: Deletes everything)
884+
# See "Destroying Resources" section below for detailed instructions
885+
./scripts/pre-destroy-cleanup.sh # Run first to clean up blocking resources
886+
terraform destroy # Then destroy infrastructure
887+
```
888+
889+
## Destroying Resources
890+
891+
Before running `terraform destroy`, you must run the pre-destroy cleanup script to remove resources that may block deletion:
892+
893+
```bash
894+
cd terraform/aws-ecs
895+
896+
# Step 1: Run pre-destroy cleanup
897+
./scripts/pre-destroy-cleanup.sh
898+
899+
# Step 2: Destroy infrastructure
885900
terraform destroy
886901
```
887902

903+
### Why Pre-Destroy Cleanup is Required
904+
905+
Terraform destroy may fail due to:
906+
- **ECS Services**: Services must be scaled to 0 and deleted before clusters can be removed
907+
- **Service Discovery Namespaces**: Must delete services within namespaces before deleting namespaces
908+
- **ECS Cluster Capacity Providers**: Clusters with active capacity providers cannot be deleted
909+
- **Secrets Manager Secrets**: Deleted secrets are scheduled for deletion (7-30 days) and block recreation with the same name
910+
911+
**Note:** ECR repositories are intentionally NOT deleted by the pre-destroy cleanup script. Container images are preserved to avoid expensive rebuilds when redeploying. See the "ECR Repository Cleanup (Optional)" section below for manual deletion commands.
912+
913+
### Manual Cleanup Commands
914+
915+
If `terraform destroy` fails, you may need to run these commands manually:
916+
917+
```bash
918+
export AWS_REGION=us-east-1
919+
920+
# ============================================================================
921+
# ECS Services Cleanup
922+
# ============================================================================
923+
# Scale down and delete ECS services
924+
aws ecs update-service --cluster mcp-gateway-ecs-cluster --service mcp-gateway-v2-registry --desired-count 0 --region $AWS_REGION
925+
aws ecs delete-service --cluster mcp-gateway-ecs-cluster --service mcp-gateway-v2-registry --force --region $AWS_REGION
926+
927+
aws ecs update-service --cluster mcp-gateway-ecs-cluster --service mcp-gateway-v2-auth --desired-count 0 --region $AWS_REGION
928+
aws ecs delete-service --cluster mcp-gateway-ecs-cluster --service mcp-gateway-v2-auth --force --region $AWS_REGION
929+
930+
aws ecs update-service --cluster keycloak --service keycloak --desired-count 0 --region $AWS_REGION
931+
aws ecs delete-service --cluster keycloak --service keycloak --force --region $AWS_REGION
932+
933+
# Wait for tasks to stop (check with)
934+
aws ecs list-tasks --cluster mcp-gateway-ecs-cluster --region $AWS_REGION
935+
aws ecs list-tasks --cluster keycloak --region $AWS_REGION
936+
937+
# ============================================================================
938+
# Service Discovery Cleanup
939+
# ============================================================================
940+
# List namespaces
941+
aws servicediscovery list-namespaces --region $AWS_REGION
942+
943+
# Delete services in namespace first
944+
aws servicediscovery list-services --filters Name=NAMESPACE_ID,Values=ns-xxxxx --region $AWS_REGION
945+
aws servicediscovery delete-service --id srv-xxxxx --region $AWS_REGION
946+
947+
# Then delete namespace
948+
aws servicediscovery delete-namespace --id ns-xxxxx --region $AWS_REGION
949+
950+
# ============================================================================
951+
# Secrets Manager Cleanup
952+
# ============================================================================
953+
# Force delete secrets that are scheduled for deletion (required before recreating)
954+
aws secretsmanager delete-secret --secret-id "keycloak/database" --force-delete-without-recovery --region $AWS_REGION
955+
aws secretsmanager delete-secret --secret-id "mcp-gateway-keycloak-client-secret" --force-delete-without-recovery --region $AWS_REGION
956+
aws secretsmanager delete-secret --secret-id "mcp-gateway-keycloak-m2m-client-secret" --force-delete-without-recovery --region $AWS_REGION
957+
958+
# ============================================================================
959+
# Targeted Terraform Destroy
960+
# ============================================================================
961+
# If full destroy fails, try targeted destroy of remaining resources
962+
terraform state list # List remaining resources
963+
964+
terraform destroy \
965+
-target=module.mcp_gateway.aws_service_discovery_private_dns_namespace.mcp \
966+
-target=module.ecs_cluster.aws_ecs_cluster.this[0] \
967+
-target=module.vpc.aws_vpc.this[0]
968+
```
969+
970+
### ECR Repository Cleanup (Optional)
971+
972+
ECR repositories are intentionally NOT deleted by the pre-destroy cleanup script to preserve container images and avoid expensive rebuilds when redeploying. If you want to completely remove all resources including ECR repositories, run these commands manually:
973+
974+
```bash
975+
export AWS_REGION=us-east-1
976+
977+
# Delete all ECR repositories (WARNING: This deletes all container images!)
978+
aws ecr delete-repository --repository-name keycloak --force --region $AWS_REGION
979+
aws ecr delete-repository --repository-name mcp-gateway-registry --force --region $AWS_REGION
980+
aws ecr delete-repository --repository-name mcp-gateway-auth-server --force --region $AWS_REGION
981+
aws ecr delete-repository --repository-name mcp-gateway-currenttime --force --region $AWS_REGION
982+
aws ecr delete-repository --repository-name mcp-gateway-mcpgw --force --region $AWS_REGION
983+
aws ecr delete-repository --repository-name mcp-gateway-realserverfaketools --force --region $AWS_REGION
984+
aws ecr delete-repository --repository-name mcp-gateway-flight-booking-agent --force --region $AWS_REGION
985+
aws ecr delete-repository --repository-name mcp-gateway-travel-assistant-agent --force --region $AWS_REGION
986+
```
987+
888988
### File Structure Reference
889989

890990
```
@@ -911,7 +1011,8 @@ terraform/aws-ecs/
9111011
├── user_mgmt.sh # Keycloak user management
9121012
├── service_mgmt.sh # Service management utilities
9131013
├── rotate-keycloak-web-client-secret.sh # Rotate OAuth2 secrets
914-
└── save-terraform-outputs.sh # Export terraform outputs as JSON
1014+
├── save-terraform-outputs.sh # Export terraform outputs as JSON
1015+
└── pre-destroy-cleanup.sh # Run before terraform destroy
9151016
```
9161017

9171018
### Environment Variables Reference

terraform/aws-ecs/scripts/post-deployment-setup.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ _verify_ecs_services() {
344344
local keycloak_cluster="keycloak"
345345
local keycloak_service="keycloak"
346346

347-
local max_attempts=30
347+
local max_attempts=40
348348
local wait_interval=20
349349

350350
log_info "Checking ECS services are running..."
@@ -530,7 +530,7 @@ _restart_services() {
530530

531531
log_info "Waiting for services to stabilize..."
532532

533-
local max_attempts=30
533+
local max_attempts=40
534534
local wait_interval=10
535535

536536
for attempt in $(seq 1 $max_attempts); do

terraform/aws-ecs/scripts/pre-destroy-cleanup.sh

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -126,33 +126,57 @@ else
126126
fi
127127

128128

129-
# Step 4: Force delete ECR repositories
129+
# Step 4: ECR Repositories - PRESERVED (not deleted)
130130
echo ""
131-
echo "Step 4: Cleaning up ECR Repositories"
132-
echo "-------------------------------------"
133-
134-
ECR_REPOS=(
135-
"mcp-gateway-registry"
136-
"mcp-gateway-auth-server"
137-
"mcp-gateway-currenttime"
138-
"mcp-gateway-mcpgw"
139-
"mcp-gateway-realserverfaketools"
140-
"mcp-gateway-flight-booking-agent"
141-
"mcp-gateway-travel-assistant-agent"
142-
"keycloak"
131+
echo "Step 4: ECR Repositories"
132+
echo "------------------------"
133+
echo ""
134+
log_warn "============================================================"
135+
log_warn "ECR REPOSITORIES ARE NOT DELETED BY THIS SCRIPT"
136+
log_warn "============================================================"
137+
log_warn ""
138+
log_warn "Container images are preserved to avoid expensive rebuilds."
139+
log_warn "Images can be reused after terraform apply without rebuilding."
140+
log_warn ""
141+
log_warn "If you want to delete ECR repositories manually, run:"
142+
log_warn ""
143+
log_warn " aws ecr delete-repository --repository-name keycloak --force --region $AWS_REGION"
144+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-registry --force --region $AWS_REGION"
145+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-auth-server --force --region $AWS_REGION"
146+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-currenttime --force --region $AWS_REGION"
147+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-mcpgw --force --region $AWS_REGION"
148+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-realserverfaketools --force --region $AWS_REGION"
149+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-flight-booking-agent --force --region $AWS_REGION"
150+
log_warn " aws ecr delete-repository --repository-name mcp-gateway-travel-assistant-agent --force --region $AWS_REGION"
151+
log_warn ""
152+
log_warn "============================================================"
153+
echo ""
154+
155+
156+
# Step 5: Force delete Secrets Manager secrets
157+
echo ""
158+
echo "Step 5: Cleaning up Secrets Manager Secrets"
159+
echo "--------------------------------------------"
160+
161+
SECRETS=(
162+
"keycloak/database"
163+
"mcp-gateway-keycloak-client-secret"
164+
"mcp-gateway-keycloak-m2m-client-secret"
143165
)
144166

145-
for repo in "${ECR_REPOS[@]}"; do
146-
if aws ecr describe-repositories --repository-names "$repo" --region "$AWS_REGION" &>/dev/null; then
147-
log_info "Force deleting ECR repository: $repo"
148-
aws ecr delete-repository --repository-name "$repo" --force --region "$AWS_REGION" 2>/dev/null || log_warn "Failed to delete $repo"
167+
for secret in "${SECRETS[@]}"; do
168+
if aws secretsmanager describe-secret --secret-id "$secret" --region "$AWS_REGION" &>/dev/null; then
169+
log_info "Force deleting secret: $secret"
170+
aws secretsmanager delete-secret --secret-id "$secret" --force-delete-without-recovery --region "$AWS_REGION" 2>/dev/null || log_warn "Failed to delete $secret"
171+
else
172+
log_info "Secret not found (already deleted): $secret"
149173
fi
150174
done
151175

152176

153-
# Step 5: Clean up any orphaned load balancers
177+
# Step 6: Clean up any orphaned load balancers
154178
echo ""
155-
echo "Step 5: Checking for orphaned resources"
179+
echo "Step 6: Checking for orphaned resources"
156180
echo "----------------------------------------"
157181

158182
# Check for target groups that might block ALB deletion

0 commit comments

Comments
 (0)