Skip to content

Commit

Permalink
fix(toolkit): CLI tool fails on CloudFormation Throttling (aws#8711)
Browse files Browse the repository at this point in the history
The CDK (particularly, `cdk deploy`) might crash after getting throttled
by CloudFormation, after the default configured 6 retries has been
reached.

This changes the retry configuration of the CloudFormation client (and
only that one) to allow up to 10 retries with a backoff base of 1
second. This makes the maximum back-off about 17 minutes, which I hope
would be plenty enough even for the 1 TPM calls. This should allow
heavily parallel deployments on the same account and region to avoid
getting killed by a throttle; but will reduce the responsiveness of the
progress UI.

Additionaly, configured a custom logger for the SDK, which would log the
SDK calls to the console when running in debug mode, allowing the users
to gain visibility on more information for troubleshooting purposes.

Fixes aws#5637
  • Loading branch information
RomainMuller authored Jun 24, 2020
1 parent d9c4f5e commit e512a40
Showing 1 changed file with 15 additions and 10 deletions.
25 changes: 15 additions & 10 deletions packages/aws-cdk/lib/api/aws-auth/sdk.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,29 +42,34 @@ export class SDK implements ISDK {
private readonly config: ConfigurationOptions;

/**
* Default retry options for SDK clients
*
* Biggest bottleneck is CloudFormation, with a 1tps call rate. We want to be
* a little more tenacious than the defaults, and with a little more breathing
* room between calls (defaults are {retries=3, base=100}).
* Default retry options for SDK clients.
*/
private readonly retryOptions = { maxRetries: 6, retryDelayOptions: { base: 300 } };

/**
* The more generous retry policy for CloudFormation, which has a 1 TPM limit on certain APIs,
* which are abundantly used for deployment tracking, ...
*
* I've left this running in a tight loop for an hour and the throttle errors
* haven't escaped the retry mechanism.
* So we're allowing way more retries, but waiting a bit more.
*/
private readonly retryOptions = { maxRetries: 6, retryDelayOptions: { base: 300 }};
private readonly cloudFormationRetryOptions = { maxRetries: 10, retryDelayOptions: { base: 1_000 } };

constructor(private readonly credentials: AWS.Credentials, region: string, httpOptions: ConfigurationOptions = {}) {
this.config = {
...httpOptions,
...this.retryOptions,
credentials,
region,
logger: { log: (...messages) => messages.forEach(m => debug('%s', m)) },
};
this.currentRegion = region;
}

public cloudFormation(): AWS.CloudFormation {
return wrapServiceErrorHandling(new AWS.CloudFormation(this.config));
return wrapServiceErrorHandling(new AWS.CloudFormation({
...this.config,
...this.cloudFormationRetryOptions,
}));
}

public ec2(): AWS.EC2 {
Expand Down Expand Up @@ -212,4 +217,4 @@ function allChainedExceptionMessages(e: Error | undefined) {
e = (e as any).originalError;
}
return ret.join(': ');
}
}

0 comments on commit e512a40

Please sign in to comment.