Skip to content

[inference] add retry mechanism to chatComplete #210859

@pgayvallet

Description

At the moment, the chatComplete API doesn't have any kind of retry-on-error logic.

If some errors are not actionable / not retry-able (e.g bad credentials, exceeded quotas), it could make sense to automatically retry some others (e.g rate limiting, connectivity issues...)

To improve resilience, and avoid forcing teams to implement that kind of logic themselves on top of the inference APIs, we should have some kind of mechanism to support automatic retry.

One option could be to mimic langchain's maxRetries mechanism, by retrying everything that is not "fatal" with incremental back-off.

One of the complexity would be to be able to identify which errors it's fine to retry and which one we should just rethrow directly.

Some pointers:

Langchain's "should I retry" logic: https://github.com/langchain-ai/langchainjs/blob/719e081213db8cd9bf8ebd9f3315f7cf1b182f0c/langchain-core/src/utils/async_caller.ts#L4

The inference API's connector error conversion logic:

import { createInferenceInternalError, InferenceTaskInternalError } from '@kbn/inference-common';
const connectorStatusCodeRegexp = /Status code: ([0-9]{3})/i;
const inferenceStatusCodeRegexp = /status \[([0-9]{3})\]/i;
export const convertUpstreamError = (
source: string | Error,
{ statusCode, messagePrefix }: { statusCode?: number; messagePrefix?: string } = {}
): InferenceTaskInternalError => {
const message = typeof source === 'string' ? source : source.message;
let status = statusCode;
if (!status && typeof source === 'object') {
status = (source as any).status ?? (source as any).response?.status;
}
if (!status) {
const match = connectorStatusCodeRegexp.exec(message);
if (match) {
status = parseInt(match[1], 10);
}
}
if (!status) {
const match = inferenceStatusCodeRegexp.exec(message);
if (match) {
status = parseInt(match[1], 10);
}
}
const messageWithPrefix = messagePrefix ? `${messagePrefix} ${message}` : message;
return createInferenceInternalError(messageWithPrefix, { status });
};

Metadata

Assignees

Labels

Team:AI InfraAppEx AI Infrastructure Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions