Open
Description
Motivations
- Unpredictable issues can cause hundreds to thousands of Lambda invokes, which can be very costly
- Router - Lambda failure to connect back to pods (or other error) results in tight invoke loop #154
- Demonstrates a case of hundreds of thousands of invokes
- Extension - Error handling improvements and tests #182
- Added information on the Extension response to indicate whether the Router needs to back off
Acceptance Criteria
- Need to put the requests to start instances into a queue
- Need configurable limits to the number of lambdas to run concurrently
- This will still get into a tight Lambda Invoke loop if the lambda is invoked correctly but cannot connect back to the Router
- Need to backoff retry on that so we don't cause huge AWS bills
- Could consider throwing an exception out of the Lambda, which would then cause AWS's SDK to perform the backoff