Account XXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit

After upgrading from v6.7.3 to v7.3.0, we started seeing throttling errors in the scale-up Lambda.

The Lambda logs the throttling error and skips the entire batch, which prevents new instances from being scaled up. As a result, scale-out does not occur and we now have multiple jobs stuck waiting for instances.

Things to consider:

- No pool configured, scaling from scratch.
- Batch of 10 for the scale up (default configuration)

Logs:

```
{
    "level": "WARN",
    "message": "Create fleet request failed.",
    "timestamp": "2026-01-28T13:30:31.998Z",
    "service": "runners-scale-up",
    "sampling_rate": 0,
    "xray_trace_id": "1-697a0f71-dss346bbce0e8c5adbdc7c7e",
    "region": "us-west-2",
    "environment": "test-al2023-x86_64-rc",
    "module": "runners",
    "aws-request-id": "78dd4a4b-7986-5e1d-bcac-2bbf6837a751",
    "function-name": "test-al2023-x86_64-rc-scale-up",
    "runner": {
        "type": "Org",
        "owner": "test",
        "namePrefix": "way-"
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "61740745129"
    },
    "error": {
        "name": "RequestLimitExceeded",
        "location": "file:///var/task/index.js:121246",
        "message": "Request limit exceeded. Account XXXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit.",
        "stack": "RequestLimitExceeded: Request limit exceeded. Account XXXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit.\n    at throwDefaultError (file:///var/task/index.js:121246:20)\n    at file:///var/task/index.js:121255:5\n    at de_CommandError (file:///var/task/index.js:17370:10)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async file:///var/task/index.js:117925:20\n    at async file:///var/task/index.js:110783:18\n    at async file:///var/task/index.js:117782:38\n    at async file:///var/task/index.js:106116:22\n    at async createInstances (file:///var/task/index.js:163220:17)\n    at async createRunner (file:///var/task/index.js:163112:19)",
        "$fault": "client",
        "$metadata": {
            "httpStatusCode": 503,
            "requestId": "15f67889-cb46-4dc2-8149-275e1419a7fa",
            "attempts": 3,
            "totalRetryDelay": 698
        },
        "Code": "RequestLimitExceeded"
    }
}
```


````
{
    "level": "ERROR",
    "message": "Error processing batch (size: 10): Rate exceeded, ignoring batch",
    "timestamp": "2026-01-29T15:56:22.134Z",
    "service": "runners-scale-up",
    "sampling_rate": 0,
    "xray_trace_id": "1-697b8320-5c9f226f9sdsdf6827760314a",
    "region": "us-west-2",
    "environment": "test-al2023-x86_64-rc",
    "aws-request-id": "f65b4e76-c0ce-50e8-851d-2f8345aee6f8",
    "function-name": "test-al2023-x86_64-rc-scale-up",
    "module": "lambda.ts",
    "error": {
        "name": "ThrottlingException",
        "location": "file:///var/task/index.js:67310",
        "message": "Rate exceeded",
        "stack": "ThrottlingException: Rate exceeded\n    at AwsJson1_1Protocol.handleError (file:///var/task/index.js:67310:27)\n    at process.processTicksAndRejections (node:internal/process/task_queues:103:5)\n    at async AwsJson1_1Protocol.deserializeResponse (file:///var/task/index.js:72235:13)\n    at async file:///var/task/index.js:72634:24\n    at async file:///var/task/index.js:70447:20\n    at async file:///var/task/index.js:74848:46\n    at async file:///var/task/index.js:68645:26\n    at async putParameter (file:///var/task/index.js:114479:5)\n    at async createJitConfig (file:///var/task/index.js:121728:9)\n    at async createStartRunnerConfig (file:///var/task/index.js:121653:9)",
        "$fault": "client",
        "$metadata": {
            "httpStatusCode": 400,
            "requestId": "5d07dc25-b704-4671-973e-4db07eb87dba",
            "attempts": 3,
            "totalRetryDelay": 450
        },
        "__type": "ThrottlingException"
    }
}
````

From AWS recommendation:

RESOLUTION STEPS:

- IMPLEMENT EXPONENTIAL BACKOFF AND RETRY LOGIC Add retry logic with exponential backoff in your Lambda function when receiving throttling errors. This allows your application to automatically retry failed requests with increasing delays between attempts.

- OPTIMIZE YOUR FLEET CREATION STRATEGY Instead of creating multiple small fleets, consider:
  - Creating fewer, larger fleets with higher target capacity
  - Batching your scaling requests to reduce API call frequency
  - Using EC2 Fleet type "instant" for immediate provisioning needs, as it has unlimited quota for Spot capacity pools

I'm wondering if in this new version that handles batch configuration, if it should implement Exponential backoff policy for retrying in case of this failure




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account XXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit #5010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Account XXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit #5010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions