[ES|QL] COMPLETION command - Inference Operator implementation #127409

afoucret · 2025-04-25T17:20:13Z

Description

This PR implements the inference operator for the completion command

Changes included:

Completion CSV tests: Added tests for completion inference on CSV datasets, and updated the inference test service to support completion use cases
New InferenceOperator: Introduced a shared operator to centralize and reuse logic across all inference operations (e.g., rerank, completion)
Throttled execution via BulkInferenceExecutor: Inference batches are now executed using the BulkInferenceExecutor, allowing for controlled concurrency and improved robustness.
RerankOperator refactor:
- Migrated to use the new InferenceOperator
- Fixed a flaky circuit breaker test
- Improved memory efficiency by switching request handling to use an iterator
- Better parallelism: big pages are now sliced in several parallel requests

Note: The completion command is currently available only in snapshot builds.

Related issue: elastic/elasticsearch#124405

* Specialize block parameters on AddInput (cherry picked from commit a5855c1) * Call the specific add() methods for eacj block type (cherry picked from commit 5176663) * Implement custom add in HashAggregationOperator (cherry picked from commit fb670bd) * Migrated everything to the new add() calls * Update docs/changelog/127582.yaml * Spotless format * Remove unused ClassName for IntVectorBlock * Fixed tests * Randomize groupIds block types to check most AddInput cases * Minor fix and added some docs * Renamed BlockHashWrapper

elasticsearchmachine · 2025-05-30T11:58:47Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

afoucret · 2025-05-30T12:23:51Z

muted-tests.yml

@@ -300,9 +300,6 @@ tests:
 - class: org.elasticsearch.search.basic.SearchWithRandomDisconnectsIT
  method: testSearchWithRandomDisconnects
  issue: https://github.com/elastic/elasticsearch/issues/122707
- class: org.elasticsearch.xpack.esql.inference.RerankOperatorTests


ℹ️ Re-enable RerankOperator flaky tests because they are fixed right now.

afoucret · 2025-05-30T12:24:43Z

...ugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlSpecTestCase.java

@@ -254,7 +247,7 @@ protected boolean supportsInferenceTestService() {
    }

    protected boolean requiresInferenceEndpoint() {
-        return Stream.of(SEMANTIC_TEXT_FIELD_CAPS.capabilityName(), RERANK.capabilityName())
+        return Stream.of(SEMANTIC_TEXT_FIELD_CAPS.capabilityName(), RERANK.capabilityName(), COMPLETION.capabilityName())


ℹ️ Can not test completion in multi_cluster cause the inference test plugin is not available.

afoucret · 2025-05-30T12:25:42Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

@@ -617,7 +617,7 @@ private LogicalPlan resolveCompletion(Completion p, List<Attribute> childrenOutp
            Expression prompt = p.prompt();

            if (targetField instanceof UnresolvedAttribute ua) {
-                targetField = new ReferenceAttribute(ua.source(), ua.name(), TEXT);
+                targetField = new ReferenceAttribute(ua.source(), ua.name(), KEYWORD);


ℹ️ keyword is the recommended ES|QL type for non-analyzed text.

afoucret · 2025-05-30T12:27:04Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/RerankOperator.java

- * 2.0.
- */
-
-package org.elasticsearch.xpack.esql.inference;


ℹ️ Moved to the org.elasticsearch.xpack.esql.inference.rerank package

kderusso

Nice work! 👏 Changes LGTM but will defer to others to accept.

kderusso · 2025-05-30T17:09:14Z

.../src/main/java/org/elasticsearch/xpack/esql/inference/bulk/BulkInferenceExecutionConfig.java

+    public static final int DEFAULT_MAX_OUTSTANDING_REQUESTS = 50;
+
+    public static final BulkInferenceExecutionConfig DEFAULT = new BulkInferenceExecutionConfig(
+        DEFAULT_WORKERS,


Should these be configurable at some point? (Maybe not in scope of this PR)

This is something for later (anticipated through this config class)

kderusso · 2025-05-30T17:13:07Z

.../plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/rerank/RerankOperator.java

+    private final int scoreChannel;
+
+    // Batch size used to group rows into a single inference request (currently fixed)
+    // TODO: make it configurable either in the command or as query pragmas


Suggested change

// TODO: make it configurable either in the command or as query pragmas

// TODO: make it configurable either in the command or as query params

See the QueryPragmas class to figure out why I chose this weird terminology 😆

…esql-completion-inference-operator

carlosdelest

Overall this LGTM, although I have some questions:

I wonder about the need of using LocalCheckpointTracker as the basis for inference result processing. I understand the benefits in terms of comparing with the latest processed number and then buffering responses - but the seq_no underlying abstraction threw me off for a while.
What are the drivers for the decisions on the number of workers / max outstanding requests?
Do you think the threadpool should be a ML based one instead of using an ESQL worker?

It's a 2k LOC change 😓 . I've done my best but I'm sure I won't be covering everything I should.

afoucret · 2025-06-02T13:24:31Z

Hey @carlosdelest,

Few answers:

I wonder about the need of using LocalCheckpointTracker as the basis for inference result processing. I understand the benefits in terms of comparing with the latest processed number and then buffering responses - but the seq_no underlying abstraction threw me off for a while.

LocalCheckpointTracker allow to receive to reorder the inference responses that can be received out of order and to persist them. This is a tool provided by the ES framework and I did not want to reinvent such a component with all the complexity it involves (thread safety, ...)

What are the drivers for the decisions on the number of workers / max outstanding requests?

It is a mix of several aspects but mostly chosen to work on a small allocation without an error.

Do you think the threadpool should be a ML based one instead of using an ESQL worker?

I do not think so. The inference tasks are run using a the inference threadpool (through the client call) but the operator coordination related tasks and data handling should stay in the ES|QL threadpool.

ioanatia · 2025-06-02T13:49:50Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/completion.csv-spec

+required_capability: completion
+
+ROW prompt="Who is Victor Hugo?"
+| COMPLETION prompt WITH test_completion AS completion_output


what happens if prompt is a multi valued field? can COMPLETION handle it? can we get a test for this use case?

If the prompt is multi-valued, the PromptReader is joining the different values using a \n.

So the multi-value input: ["Translate this movie description in French", movie_description]

Will be translated into the following prompt:

Translate this movie description in French: Long time ago....

I built this as a quite good alternative to concat in some case.

Also, I added a CSV test cases for it.

ioanatia · 2025-06-02T13:52:47Z

...ugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java

@@ -461,6 +475,17 @@ public static boolean clusterHasRerankInferenceEndpoint(RestClient client) throw
        return true;
    }

+    private static void deleteInferenceEndpoint(RestClient client, String inferenceId) throws IOException {
+        try {
+            client.performRequest(new Request("DELETE", "_inference/" + inferenceId));


do we give the right path here? is it supposed to be DELETE _inference/rerank/test_reranker? do we need to pass the task type to the deleteInferenceEndpoint method?

In fact, both endpoints are valid and can be used indifferently.

ioanatia · 2025-06-03T08:38:04Z

.../src/main/java/org/elasticsearch/xpack/esql/inference/bulk/BulkInferenceExecutionConfig.java

+
+package org.elasticsearch.xpack.esql.inference.bulk;
+
+public record BulkInferenceExecutionConfig(int workers, int maxOutstandingRequests) {


We don't plan to make these configurable in the near future - do we ever use anything else than the DEFAULT?
I am not a fan of adding these types of record classes that are only used to store defaults that are not configurable.
To me this just adds a cognitive load for anyone looking into rerank/completion.
I'd rather have the DEFAULT_WORKERS and DEFAULT_MAX_OUTSTANDING_REQUESTS in the base operator that uses them, instead of carrying around these record objects.

Maybe we anticipate these will be configurable, but until then this serves no purpose. I am a huge believer that we shouldn't add abstractions/constructions like these until they are actually used.

I’d prefer to keep the config centralized in one place rather than spreading it across multiple classes. It makes things easier to manage and reason about for me.

It also helps with testing flexibility. In BulkInferenceExecutorTests, where I vary config values (e.g. number of outstanding requests) to test that it works with a wide range of config. This was really helpful to make sure that the component was working at different scale and would be more cumbersome if the config were just a constant in the class.

Maybe I’m anticipating a bit, but even if we don’t plan to make these settings configurable for end users, having the config passed a parameter of the InferenceOperator could make our life easier if we later realize we need different configs for RERANK and COMPLETION.

So unless it is very important for you to change it, I would definitely prefer to keep this class here.

it's okay - we don't need to block merging this PR because of this one

ioanatia · 2025-06-03T08:51:28Z

.../src/main/java/org/elasticsearch/xpack/esql/inference/bulk/BulkInferenceExecutionConfig.java

+package org.elasticsearch.xpack.esql.inference.bulk;
+
+public record BulkInferenceExecutionConfig(int workers, int maxOutstandingRequests) {
+    public static final int DEFAULT_WORKERS = 10;


how did we arrive to this default value? I see that we are using the same threadPool for InferenceRunner in TransportEsqlQueryAction.

See my response to Carlos here

ioanatia · 2025-06-03T08:58:42Z

.../plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/rerank/RerankOperator.java

+public class RerankOperator extends InferenceOperator {
+
+    // Default number of rows to include per inference request
+    private static final int DEFAULT_BATCH_SIZE = 20;


do we know if the rerank API has any limitations when sending large docs?
we might want to control the batch size not just by the number of rows, but on the total input size.
okay with me to follow up on this separately

IMHO, this is something to be discussed with ML folks.

If possible, I would prefer to keep a batch_size with a number of row and let the inference team handle large docs on their side. There is also the extract_snippet function that should be used with large docs.

I will log an issue to determine what our strategy should be for large documents.

…esql-completion-inference-operator

ioanatia

🚀 🚀 🚀
good work!

…ic#127409)

afoucret added >non-issue :Analytics/ES|QL AKA ESQL v9.1.0 labels Apr 25, 2025

afoucret force-pushed the esql-completion-inference-operator branch from 54e4c85 to 42e14c1 Compare May 7, 2025 12:35

afoucret force-pushed the esql-completion-inference-operator branch from 5f9829e to e0a14ae Compare May 21, 2025 08:31

afoucret and others added 25 commits May 22, 2025 17:25

InferenceOperator refactoring.

b85e704

CompletionOperator skeleton.

e6ac175

CSV tests inference refactoring

7fe8adc

Draft CompletionOperator.

757fbe4

Draft CompletionOperator.

39ad919

Move inference result type check to the InferenceOperator

6f5a8b3

Refactored inference operator.

bbd1f69

Restore removed code.

71ad3b8

Refactored bulk inference execution.

33a289d

Rollback muted tests changes.

d5797d3

Fix some tests for the rerank operator.

62f39eb

Bulk inference refactoring.

2229c94

[CI] Auto commit changes from spotless

3a59b96

BulkInferenceExecutor unit tests.

7423cfc

Fix a memory leak in the InferenceOperator

7acd6dd

Lint.

1c1c003

Finished refactoring the inference operator implementation.

505dbdc

[CI] Auto commit changes from spotless

815d479

Fixing some tests.

1657d2c

Improving BulkInferenceExecutorTests

3d48b05

Another refactoring

06b99ed

One more refactoring.

66563b6

Code simplification.

5384db0

Lint

04f0d36

afoucret commented May 30, 2025

View reviewed changes

afoucret requested review from tteofili, carlosdelest, ioanatia, ChrisHegarty and kderusso May 30, 2025 14:04

afoucret mentioned this pull request May 30, 2025

ES|QL - Completion command #124405

Open

11 tasks

kderusso reviewed May 30, 2025

View reviewed changes

afoucret requested a review from dnhatn May 30, 2025 19:27

Minor improvements

0a6ef3f

afoucret force-pushed the esql-completion-inference-operator branch from 0e009c7 to 0a6ef3f Compare June 2, 2025 07:38

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

cc7d8fa

…esql-completion-inference-operator

carlosdelest approved these changes Jun 2, 2025

View reviewed changes

afoucret requested a review from svilen-mihaylov-elastic June 2, 2025 13:06

ioanatia reviewed Jun 3, 2025

View reviewed changes

afoucret added 2 commits June 3, 2025 16:45

Adding a test case for multivalued prompt.

04191cb

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

7582956

…esql-completion-inference-operator

afoucret requested a review from ioanatia June 3, 2025 16:19

Merge branch 'main' into esql-completion-inference-operator

2f751cc

afoucret mentioned this pull request Jun 4, 2025

[8.19] Manual backport of ES|QL inference features (RERANK, COMPLETION). #128907

Merged

17 tasks

ioanatia approved these changes Jun 4, 2025

View reviewed changes

afoucret merged commit 993090d into elastic:main Jun 5, 2025
18 checks passed

afoucret deleted the esql-completion-inference-operator branch June 5, 2025 06:45

afoucret added a commit to afoucret/elasticsearch that referenced this pull request Jun 5, 2025

[ES|QL] COMPLETION command - Inference Operator implementation (elast…

29a41cc

…ic#127409)

davidkyle mentioned this pull request Jun 6, 2025

[ML] Fix InferenceGetServicesIT#testGetServicesWithCompletionTaskType #129052

Open

	// TODO: make it configurable either in the command or as query pragmas
	// TODO: make it configurable either in the command or as query params


		package org.elasticsearch.xpack.esql.inference.bulk;

		public record BulkInferenceExecutionConfig(int workers, int maxOutstandingRequests) {

[ES|QL] COMPLETION command - Inference Operator implementation #127409

[ES|QL] COMPLETION command - Inference Operator implementation #127409

Uh oh!

Conversation

afoucret commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes included:

Uh oh!

elasticsearchmachine commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

afoucret commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

afoucret commented Apr 25, 2025 •

edited

Loading

afoucret commented Jun 2, 2025 •

edited

Loading

ioanatia Jun 2, 2025 •

edited

Loading