Description
At the moment, the chatComplete
API doesn't have any kind of retry-on-error logic.
If some errors are not actionable / not retry-able (e.g bad credentials, exceeded quotas), it could make sense to automatically retry some others (e.g rate limiting, connectivity issues...)
To improve resilience, and avoid forcing teams to implement that kind of logic themselves on top of the inference APIs, we should have some kind of mechanism to support automatic retry.
One option could be to mimic langchain's maxRetries
mechanism, by retrying everything that is not "fatal" with incremental back-off.
One of the complexity would be to be able to identify which errors it's fine to retry and which one we should just rethrow directly.
Some pointers:
Langchain's "should I retry" logic: https://github.com/langchain-ai/langchainjs/blob/719e081213db8cd9bf8ebd9f3315f7cf1b182f0c/langchain-core/src/utils/async_caller.ts#L4
The inference API's connector error conversion logic: