Adds custom inference service API docs #4852

szabosteve · 2025-07-09T12:19:30Z

Overview

Related issue: https://github.com/elastic/developer-docs-team/issues/307

This PR adds documentation about the custom inference service.

@jonathan-buttner Could you please provide an example request that I can add to the docs?

szabosteve · 2025-07-09T12:22:21Z

specification/inference/_types/CommonTypes.ts

+  /** 
+   * Specifies the JSON parser that is used to parse the response from the custom service.
+   * Different task types require different json_parser parameters.
+   * For example:


@jonathan-buttner Do you think we should specify a JsonParser class for each task type, or is this list sufficient?

Hmm I think it might be better if we give an example of the response structure for each task type and explain how to create the parser from that.

We should also say that the format is a less featured version of JSONPath: https://en.wikipedia.org/wiki/JSONPath

Here are some examples:

Text Embeddings

For a response that looks like:

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ 0.014539449, -0.015288644 ] } ], "model": "text-embedding-ada-002-v2", "usage": { "prompt_tokens": 8, "total_tokens": 8 } }

We'd need this definition:

"response": { "json_parser": { "text_embeddings": "$.data[*].embedding[*]" } }

Rerank

For a response that looks like:

{ "results": [ { "index": 3, "relevance_score": 0.999071, "document": "abc" }, { "index": 4, "relevance_score": 0.7867867, "document": "123" }, { "index": 0, "relevance_score": 0.32713068, "document": "super" } ] }

We'd need this definition:

"response": { "json_parser": { "reranked_index":"$.results[*].index", "relevance_score":"$.results[*].relevance_score", "document_text":"$.results[*].document" } }

reranked_index and document_text are optional.

Completion

For a response that looks like:

{ "id": "chatcmpl-B9MBs8CjcvOU2jLn4n570S5qMJKcT", "object": "chat.completion", "created": 1741569952, "model": "gpt-4.1-2025-04-14", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I assist you today?", "refusal": null, "annotations": [] }, "logprobs": null, "finish_reason": "stop" } ] }

We'd need this definition:

"response": { "json_parser": { "completion_result":"$.choices[*].message.content" } }

Sparse embedding

For a response that looks like:

{ "request_id": "75C50B5B-E79E-4930-****-F48DBB392231", "latency": 22, "usage": { "token_count": 11 }, "result": { "sparse_embeddings": [ { "index": 0, "embedding": [ { "token_id": 6, "weight": 0.101 }, { "token_id": 163040, "weight": 0.28417 } ] } ] } }

We'd need this definition:

"response": { "json_parser": { "token_path": "$.result.sparse_embeddings[*].embedding[*].token_id", "weight_path": "$.result.sparse_embeddings[*].embedding[*].weight" } }

If the token_path resulting value (token_id in this example) refers to a non-string (an integer in this example), it'll be converted to a string using Java's .toString() method. Not sure how we want to articulate that though 🤔

szabosteve · 2025-07-09T12:23:56Z

specification/inference/_types/CommonTypes.ts

+}
+
+export enum CustomServiceType {
+  custom


@jonathan-buttner Should the ServiceType be custom whenever it's specified for this service type? Or can it be anything, for example custom-model?

Yeah it must be custom, just like to use openai it must be openai.

szabosteve · 2025-07-09T12:26:39Z

specification/inference/put_custom/PutCustomRequest.ts

+/**
+ * Create a custom inference endpoint.
+ *
+ * You can create an inference endpoint to perform an inference task with a custom model that supports the HTTP format.


@jonathan-buttner Please suggest an alternative description if you think this is not sufficient. I tried to come up with something that is meaningful to me based on my limited knowledge.

Hmm maybe something like this:

The custom service gives more control over how to interact with external inference services that aren't explicitly supported through dedicated integrations. The custom service gives users the ability to define the headers, url, query parameters, request body, and secrets.

szabosteve · 2025-07-09T12:27:11Z

specification/inference/put_custom/PutCustomRequest.ts

+     * The chunking configuration object.
+     * @ext_doc_id inference-chunking
+     */
+    chunking_settings?: InferenceChunkingSettings


Are chunking settings relevant for this service?

…ticsearch-specification into szabosteve/infer-put-custom

github-actions · 2025-07-09T14:38:23Z

Following you can find the validation changes against the target branch for the APIs.

No changes detected.

You can validate these APIs yourself by using the make validate target.

…ticsearch-specification into szabosteve/infer-put-custom

jonathan-buttner · 2025-07-09T20:35:38Z

WIP (I'll update this comment with a bunch of examples).

Here are some examples:

OpenAI Text Embedding

PUT _inference/text_embedding/test
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.openai.com/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
        },
        "request": "{\"input\": ${input}, \"model\": \"text-embedding-3-small\"}",
        "response": {
            "json_parser": {
                "text_embeddings": "$.data[*].embedding[*]"
            }
        }
    }
}

Cohere APIv2 Rerank

PUT _inference/rerank/test-rerank
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/rerank",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"documents\": ${input}, \"query\": ${query}, \"model\": \"rerank-v3.5\"}",
        "response": {
            "json_parser": {
                "reranked_index":"$.results[*].index",
                "relevance_score":"$.results[*].relevance_score"
            }
        }
    }
}

Cohere APIv2 Text Embedding

PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/embed",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"texts\": ${input}, \"model\": \"embed-v4.0\", \"input_type\": ${input_type}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.embeddings.float[*]"
            }
        },
        "input_type": {
            "translation": {
                "ingest": "search_document",
                "search": "search_query"
            },
            "default": "search_document"
        }
    }
}

Jina AI Rerank

PUT _inference/rerank/jina
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "<api key>"
    },    
    "url": "https://api.jina.ai/v1/rerank",
    "headers": {
      "Content-Type": "application/json",
      "Authorization": "Bearer ${api_key}"
    },
    "request": "{\"model\": \"jina-reranker-v2-base-multilingual\",\"query\": ${query},\"documents\":${input}}",
    "response": {
      "json_parser": {
        "relevance_score": "$.results[*].relevance_score",
        "reranked_index": "$.results[*].index"
      }
    }
  }
}

Hugging Face Text Embedding for model Qwen/Qwen3-Embedding-8B (other will be very similar)

PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "<dedicated inference endpoint on HF>/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"input\": ${input}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.data[*].embedding[*]"
            }
        }
    }
}

TODO

VoyageAI
Hugging Face Rerank
Google VertexAI
Azure

jonathan-buttner

Great work! We'll want to add a blurb about how the custom service performs template replacement.

The template replacement functionality allows templates (portions of a string that start with ${ and end with }) to be replaced with the contents of a value that defines that key.

We look in secret_parameters and task_settings for keys to do template replacement.

We replace templates in the fields request, headers, url, and query_parameters.

If we fail to find the definition (key) for a template we emit an error.

So for example if we had the endpoint definition like this:

PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<some api key>"
        },
        "url": "...endpoints.huggingface.cloud/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"input\": ${input}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.data[*].embedding[*]"
            }
        }
    }
}

We'll look to replace ${api_key} from secret_parameters and task_settings. We should also make a note explicitly that the templates should not be surrounded by quotes (we add the quotes internally).

There are a few "special" templates:

${input} this refers to the array of input strings that comes from the input field of the subsequent inference requests
${input_type} this refers to the input type translation values (I explain this below)
${query} this refers to the query field used specifically for rerank
${top_n} this refers to the top_n` field available when performing rerank requests
${return_documents} this refers to the return_documents` field available when performing rerank requests

jonathan-buttner · 2025-07-10T16:23:54Z

specification/inference/_types/CommonTypes.ts

@@ -758,6 +758,136 @@ export class CohereTaskSettings {
  truncate?: CohereTruncateType
 }

+export class CustomServiceSettings {


We also support query parameters:

"query_parameters": [ ["test_key", "test_value"] ]

It's a list of tuples (the inner array must be 2 items).

This would be invalid:

"query_parameters": [ ["test_key", "test_value", "some_other_value"] ]

This is valid:

"query_parameters": [ ["test_key", "test_value"], ["test_key", "some_other_value"] ]

jonathan-buttner · 2025-07-10T16:24:36Z

specification/inference/_types/CommonTypes.ts

+   * For example:
+   * ```
+   * "request":{
+   *   "content":"{\"input\":${input}}"


We flattened this so it's "request": "{\"input\":${input}}" now. We removed content.

jonathan-buttner · 2025-07-10T17:30:10Z

specification/inference/_types/CommonTypes.ts

+   * }
+   * ```
+   */
+  error_parser: UserDefinedValue


We can remove this field, we simplified the error handling logic and removed this.

jonathan-buttner · 2025-07-10T20:08:08Z

specification/inference/_types/CommonTypes.ts

+  /** 
+   * Specifies the JSON parser that is used to parse the response from the custom service.
+   * Different task types require different json_parser parameters.
+   * For example:


Hmm I think it might be better if we give an example of the response structure for each task type and explain how to create the parser from that.

We should also say that the format is a less featured version of JSONPath: https://en.wikipedia.org/wiki/JSONPath

Here are some examples:

Text Embeddings

For a response that looks like:

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ 0.014539449, -0.015288644 ] } ], "model": "text-embedding-ada-002-v2", "usage": { "prompt_tokens": 8, "total_tokens": 8 } }

We'd need this definition:

"response": { "json_parser": { "text_embeddings": "$.data[*].embedding[*]" } }

Rerank

For a response that looks like:

{ "results": [ { "index": 3, "relevance_score": 0.999071, "document": "abc" }, { "index": 4, "relevance_score": 0.7867867, "document": "123" }, { "index": 0, "relevance_score": 0.32713068, "document": "super" } ] }

We'd need this definition:

"response": { "json_parser": { "reranked_index":"$.results[*].index", "relevance_score":"$.results[*].relevance_score", "document_text":"$.results[*].document" } }

reranked_index and document_text are optional.

Completion

For a response that looks like:

{ "id": "chatcmpl-B9MBs8CjcvOU2jLn4n570S5qMJKcT", "object": "chat.completion", "created": 1741569952, "model": "gpt-4.1-2025-04-14", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I assist you today?", "refusal": null, "annotations": [] }, "logprobs": null, "finish_reason": "stop" } ] }

We'd need this definition:

"response": { "json_parser": { "completion_result":"$.choices[*].message.content" } }

Sparse embedding

For a response that looks like:

{ "request_id": "75C50B5B-E79E-4930-****-F48DBB392231", "latency": 22, "usage": { "token_count": 11 }, "result": { "sparse_embeddings": [ { "index": 0, "embedding": [ { "token_id": 6, "weight": 0.101 }, { "token_id": 163040, "weight": 0.28417 } ] } ] } }

We'd need this definition:

"response": { "json_parser": { "token_path": "$.result.sparse_embeddings[*].embedding[*].token_id", "weight_path": "$.result.sparse_embeddings[*].embedding[*].weight" } }

If the token_path resulting value (token_id in this example) refers to a non-string (an integer in this example), it'll be converted to a string using Java's .toString() method. Not sure how we want to articulate that though 🤔

jonathan-buttner · 2025-07-10T20:38:03Z

specification/inference/_types/CommonTypes.ts

+}
+
+export enum CustomServiceType {
+  custom


Yeah it must be custom, just like to use openai it must be openai.

jonathan-buttner · 2025-07-10T20:41:17Z

specification/inference/put_custom/PutCustomRequest.ts

+/**
+ * Create a custom inference endpoint.
+ *
+ * You can create an inference endpoint to perform an inference task with a custom model that supports the HTTP format.


Hmm maybe something like this:

The custom service gives more control over how to interact with external inference services that aren't explicitly supported through dedicated integrations. The custom service gives users the ability to define the headers, url, query parameters, request body, and secrets.

jonathan-buttner · 2025-07-10T20:42:11Z

specification/inference/put_custom/PutCustomRequest.ts

+     * The chunking configuration object.
+     * @ext_doc_id inference-chunking
+     */
+    chunking_settings?: InferenceChunkingSettings


jonathan-buttner · 2025-07-10T21:02:19Z

specification/inference/_types/CommonTypes.ts

+   * The URL endpoint to use for the requests.
+   */
+  url?: string
+}


We also parse an optional input_type field. Here's an example:

PUT _inference/text_embedding/test-text-embedding { "service": "custom", "service_settings": { "secret_parameters": { "api_key": "<api key>" }, "url": "https://api.cohere.com/v2/embed", "headers": { "Authorization": "bearer ${api_key}", "Content-Type": "application/json" }, "request": "{\"texts\": ${input}, \"model\": \"embed-v4.0\", \"input_type\": ${input_type}}", "response": { "json_parser": { "text_embeddings":"$.embeddings.float[*]" } }, "input_type": { "translation": { "ingest": "search_document", "search": "search_query" }, "default": "search_document" } } }

${input_type} this refers to the input type translation values

"input_type": { "translation": { "ingest": "do_ingest", "search": "do_search" }, "default": "a_default" },

If the subsequent inference requests come from a search context we'll use the search key here and replace the template with do_search. If it comes from the ingest context we'll use do_ingest. If it's a different context that we haven't specified well fallback to the default provided. If no default is specified we use an empty string.

The keys we allow in translation are:

classification clustering ingest search

This is particularly useful for integrations like Cohere that allow an input type field in their API: https://docs.cohere.com/reference/embed#request.body.input_type

szabosteve added 3 commits July 9, 2025 10:58

Adds custom inference service docs.

65cb119

Adds response documentation.

ac77396

Adds request params docs.

389ce57

szabosteve requested a review from jonathan-buttner July 9, 2025 12:19

szabosteve added specification documentation ml backport 8.19 backport 9.1 labels Jul 9, 2025

szabosteve commented Jul 9, 2025

View reviewed changes

szabosteve added 7 commits July 9, 2025 14:34

Merge branch 'main' into szabosteve/infer-put-custom

a9a560e

Fixes code style.

3c4eb3f

Merge branch 'szabosteve/infer-put-custom' of github.com:elastic/elas…

3d90b42

…ticsearch-specification into szabosteve/infer-put-custom

Fixes data type.

bc66328

Adds json_spec.

1b3fe33

Fixes typo.

fab50c4

Adds doc_id to the table.csv file.

e185149

szabosteve added 3 commits July 9, 2025 17:47

Merge branch 'main' into szabosteve/infer-put-custom

859713d

Makes it prettier.

8051083

Merge branch 'szabosteve/infer-put-custom' of github.com:elastic/elas…

61c6a98

…ticsearch-specification into szabosteve/infer-put-custom

szabosteve added 3 commits July 10, 2025 13:38

Adds examples.

e0963ae

Merge branch 'main' into szabosteve/infer-put-custom

bd8ec2b

Format fix.

d568234

jonathan-buttner reviewed Jul 10, 2025

View reviewed changes

Adds custom inference service API docs #4852

Are you sure you want to change the base?

Adds custom inference service API docs #4852

Uh oh!

Conversation

szabosteve commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

szabosteve Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

jonathan-buttner commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

szabosteve commented Jul 9, 2025 •

edited

Loading

szabosteve Jul 9, 2025 •

edited

Loading

jonathan-buttner commented Jul 9, 2025 •

edited

Loading