Closed
Description
Description
Some things to include:
- mention API gateway's 30 second timeout
- use a small value
max_concurrency
(especially before Improve inter-replica fairness for real-time APIs #1240 and Improve inter-process queue fairness #839 are resolved) - use a reduced value for
target_concurrency
if overprovisioning is desired - until Improve inter-replica fairness for real-time APIs #1240 and Improve inter-process queue fairness #839 are resolved, explain how the replica queue works (i.e. that it is per-process), and that you will see 503 errors before fully filling the "global" queue
min_replicas
should be at or slightly above what is expected to be required at steady-state, and then reduced after traffic has been directed to the API- if spot, using multiple instance types improves the chances of getting a spot instance
- list the smallest instance type as the primary type
- specify max price explicitly
Also consider making this a general-purpose "running in production" guide, where the topics above are in a section. Some other things to include in the general guide is processes/threads, overprovisioning, build images ahead of time (?), ...