Skip to content

Extension - Error handling improvements and tests #182

Closed
@huntharo

Description

@huntharo

Motivations

  • Exiting the lambda exec env can be very costly for applications that have a high cold start time
  • If the exec env is still functioning, it should be preserved
  • If the extension provides information to the router, the router can better determine if:
    • All invokes are failing across all lambdas
      • Start exponential backoff
      • Consider marking the router unhealthy and/or shutting down as the issue could be with the router
    • All invokes of 1 lambda are failing - this single lambda might have an issue (OOM or out of file descriptors) and should exit
    • Random invokes are failing - Log and continue

Acceptance Criteria

  • Audit all panic! usage - Most likely these need to be removed (we never want to exit with other good requests in flight)
  • Audit all unwrap usage - These likely need to convert to ? and an error needs to be passed up the chain
  • Add info to the WaiterResponse: request_count, invoke_duration, disposition (self_deadline, self_last_active, app_connection_closed, router_connection_closed, router_unreachable, router_goaway)
  • If the app is unreachable, the extension should gracefully exit so that the exec env is destroyed (this is the only way to "recover" the app - this is: there is no way other than to create a new exec env)

Metadata

Metadata

Assignees

Labels

area-extensionExtension portion of the appbugSomething isn't workingtestsAdding tests

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions