[inference] add retry mechanism to `chatComplete`

At the moment, the `chatComplete` API doesn't have any kind of retry-on-error logic. 

If some errors are not actionable / not retry-able (e.g bad credentials, exceeded quotas), it could make sense to automatically retry some others (e.g rate limiting, connectivity issues...)

To improve resilience, and avoid forcing teams to implement that kind of logic themselves on top of the inference APIs, we should have some kind of mechanism to support automatic retry.

One option could be to mimic langchain's `maxRetries` mechanism, by retrying everything that is not "fatal" with incremental back-off. 

One of the complexity would be to be able to identify which errors it's fine to retry and which one we should just rethrow directly.

Some pointers:

Langchain's "should I retry" logic: https://github.com/langchain-ai/langchainjs/blob/719e081213db8cd9bf8ebd9f3315f7cf1b182f0c/langchain-core/src/utils/async_caller.ts#L4

The inference API's connector error conversion logic: https://github.com/elastic/kibana/blob/1c218f9846b98ba2e8ea67918c42d2399a014c11/x-pack/platform/plugins/shared/inference/server/chat_complete/utils/convert_upstream_error.ts#L8-L39 



	import { createInferenceInternalError, InferenceTaskInternalError } from '@kbn/inference-common';

	const connectorStatusCodeRegexp = /Status code: ([0-9]{3})/i;
	const inferenceStatusCodeRegexp = /status \[([0-9]{3})\]/i;

	export const convertUpstreamError = (
	source: string \| Error,
	{ statusCode, messagePrefix }: { statusCode?: number; messagePrefix?: string } = {}
	): InferenceTaskInternalError => {
	const message = typeof source === 'string' ? source : source.message;

	let status = statusCode;
	if (!status && typeof source === 'object') {
	status = (source as any).status ?? (source as any).response?.status;
	}
	if (!status) {
	const match = connectorStatusCodeRegexp.exec(message);
	if (match) {
	status = parseInt(match[1], 10);
	}
	}
	if (!status) {
	const match = inferenceStatusCodeRegexp.exec(message);
	if (match) {
	status = parseInt(match[1], 10);
	}
	}

	const messageWithPrefix = messagePrefix ? `${messagePrefix} ${message}` : message;

	return createInferenceInternalError(messageWithPrefix, { status });
	};

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inference] add retry mechanism to `chatComplete` #210859

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[inference] add retry mechanism to chatComplete #210859

Description