Skip to content

Account XXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit #5010

@andrecastro

Description

@andrecastro

After upgrading from v6.7.3 to v7.3.0, we started seeing throttling errors in the scale-up Lambda.

The Lambda logs the throttling error and skips the entire batch, which prevents new instances from being scaled up. As a result, scale-out does not occur and we now have multiple jobs stuck waiting for instances.

Things to consider:

  • No pool configured, scaling from scratch.
  • Batch of 10 for the scale up (default configuration)

Logs:

{
    "level": "WARN",
    "message": "Create fleet request failed.",
    "timestamp": "2026-01-28T13:30:31.998Z",
    "service": "runners-scale-up",
    "sampling_rate": 0,
    "xray_trace_id": "1-697a0f71-dss346bbce0e8c5adbdc7c7e",
    "region": "us-west-2",
    "environment": "test-al2023-x86_64-rc",
    "module": "runners",
    "aws-request-id": "78dd4a4b-7986-5e1d-bcac-2bbf6837a751",
    "function-name": "test-al2023-x86_64-rc-scale-up",
    "runner": {
        "type": "Org",
        "owner": "test",
        "namePrefix": "way-"
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "61740745129"
    },
    "error": {
        "name": "RequestLimitExceeded",
        "location": "file:///var/task/index.js:121246",
        "message": "Request limit exceeded. Account XXXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit.",
        "stack": "RequestLimitExceeded: Request limit exceeded. Account XXXXXX has been throttled on ec2:CreateFleet because it exceeded its request rate limit.\n    at throwDefaultError (file:///var/task/index.js:121246:20)\n    at file:///var/task/index.js:121255:5\n    at de_CommandError (file:///var/task/index.js:17370:10)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async file:///var/task/index.js:117925:20\n    at async file:///var/task/index.js:110783:18\n    at async file:///var/task/index.js:117782:38\n    at async file:///var/task/index.js:106116:22\n    at async createInstances (file:///var/task/index.js:163220:17)\n    at async createRunner (file:///var/task/index.js:163112:19)",
        "$fault": "client",
        "$metadata": {
            "httpStatusCode": 503,
            "requestId": "15f67889-cb46-4dc2-8149-275e1419a7fa",
            "attempts": 3,
            "totalRetryDelay": 698
        },
        "Code": "RequestLimitExceeded"
    }
}
{
    "level": "ERROR",
    "message": "Error processing batch (size: 10): Rate exceeded, ignoring batch",
    "timestamp": "2026-01-29T15:56:22.134Z",
    "service": "runners-scale-up",
    "sampling_rate": 0,
    "xray_trace_id": "1-697b8320-5c9f226f9sdsdf6827760314a",
    "region": "us-west-2",
    "environment": "test-al2023-x86_64-rc",
    "aws-request-id": "f65b4e76-c0ce-50e8-851d-2f8345aee6f8",
    "function-name": "test-al2023-x86_64-rc-scale-up",
    "module": "lambda.ts",
    "error": {
        "name": "ThrottlingException",
        "location": "file:///var/task/index.js:67310",
        "message": "Rate exceeded",
        "stack": "ThrottlingException: Rate exceeded\n    at AwsJson1_1Protocol.handleError (file:///var/task/index.js:67310:27)\n    at process.processTicksAndRejections (node:internal/process/task_queues:103:5)\n    at async AwsJson1_1Protocol.deserializeResponse (file:///var/task/index.js:72235:13)\n    at async file:///var/task/index.js:72634:24\n    at async file:///var/task/index.js:70447:20\n    at async file:///var/task/index.js:74848:46\n    at async file:///var/task/index.js:68645:26\n    at async putParameter (file:///var/task/index.js:114479:5)\n    at async createJitConfig (file:///var/task/index.js:121728:9)\n    at async createStartRunnerConfig (file:///var/task/index.js:121653:9)",
        "$fault": "client",
        "$metadata": {
            "httpStatusCode": 400,
            "requestId": "5d07dc25-b704-4671-973e-4db07eb87dba",
            "attempts": 3,
            "totalRetryDelay": 450
        },
        "__type": "ThrottlingException"
    }
}

From AWS recommendation:

RESOLUTION STEPS:

  • IMPLEMENT EXPONENTIAL BACKOFF AND RETRY LOGIC Add retry logic with exponential backoff in your Lambda function when receiving throttling errors. This allows your application to automatically retry failed requests with increasing delays between attempts.

  • OPTIMIZE YOUR FLEET CREATION STRATEGY Instead of creating multiple small fleets, consider:

    • Creating fewer, larger fleets with higher target capacity
    • Batching your scaling requests to reduce API call frequency
    • Using EC2 Fleet type "instant" for immediate provisioning needs, as it has unlimited quota for Spot capacity pools

I'm wondering if in this new version that handles batch configuration, if it should implement Exponential backoff policy for retrying in case of this failure

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions