Skip to content

Commit f1fd01b

Browse files
fix(experimental-ec2-pattern): suspend more processes during deployment (#2752)
* fix(experimental-ec2-pattern): suspend more processes during deployment * fix(experimental-ec2-pattern): suspend HealthCheck too * Mention HealthCheck in changelog --------- Co-authored-by: Jorge Azevedo <jorge.azevedo@guardian.co.uk>
1 parent a986201 commit f1fd01b

File tree

3 files changed

+32
-5
lines changed

3 files changed

+32
-5
lines changed

.changeset/clever-tigers-bet.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
"@guardian/cdk": patch
3+
---
4+
5+
The new deployment mechanism (`GuEc2AppExperimental`) now suspends some additional ASG processes:
6+
7+
`AZRebalance`
8+
`InstanceRefresh`
9+
`ReplaceUnhealthy`
10+
`ScheduledActions`
11+
`HealthCheck`
12+
13+
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html#process-types
14+
15+
This follows a recommendation from AWS and should make deployments (and rollbacks) more reliable:
16+
https://repost.aws/knowledge-center/auto-scaling-group-rolling-updates

src/experimental/patterns/__snapshots__/ec2-app.test.ts.snap

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,11 @@ exports[`The GuEc2AppExperimental pattern matches the snapshot 1`] = `
196196
"PauseTime": "PT3M",
197197
"SuspendProcesses": [
198198
"AlarmNotification",
199+
"AZRebalance",
200+
"HealthCheck",
201+
"InstanceRefresh",
202+
"ReplaceUnhealthy",
203+
"ScheduledActions",
199204
],
200205
"WaitOnResourceSignals": true,
201206
},

src/experimental/patterns/ec2-app.ts

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -285,12 +285,18 @@ export class GuRollingUpdatePolicyExperimental {
285285
minSuccessPercentage: 100,
286286
waitOnResourceSignals: true,
287287
/*
288-
If a scale-in event fires during an `AutoScalingRollingUpdate` operation, the update could fail and rollback.
289-
For this reason, we suspend the `AlarmNotification` process, else availability of a service cannot be guaranteed.
290-
Consequently, services cannot scale-out during deployments.
291-
If AWS ever supports suspending scale-out and scale-in independently, we should allow scale-out.
288+
All of these processes can launch and/or terminate instances and if this happens during a deployment it might
289+
cause the rolling update to fail: https://repost.aws/knowledge-center/auto-scaling-group-rolling-updates
290+
The full list of processes can be found here: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html#process-types
292291
*/
293-
suspendProcesses: [ScalingProcess.ALARM_NOTIFICATION],
292+
suspendProcesses: [
293+
ScalingProcess.ALARM_NOTIFICATION,
294+
ScalingProcess.AZ_REBALANCE,
295+
ScalingProcess.HEALTH_CHECK,
296+
ScalingProcess.INSTANCE_REFRESH,
297+
ScalingProcess.REPLACE_UNHEALTHY,
298+
ScalingProcess.SCHEDULED_ACTIONS,
299+
],
294300
/*
295301
Note: this is increased via an Aspect which also takes healthcheck grace period into account.
296302

0 commit comments

Comments
 (0)