Add Ibm Granite Completion and Chat Completion support #129146

Evgenii-Kazannik · 2025-06-09T12:54:38Z

Extend watsonx ai with completion and chat completion tasks in order to use corresponding IBM Granite models

Jan-Kazlouski-elastic

Left a few suggestions.

Jan-Kazlouski-elastic · 2025-06-10T09:51:14Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

+     * @return A formatted error message.
+     */
+    public static String buildErrorMessage(TaskType requestType, String inferenceId) {
+        return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);


Suggested change

return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

return format("Failed to send IBM Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

Done. Thank you

Jan-Kazlouski-elastic · 2025-06-10T09:51:39Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

    protected IbmWatsonxEmbeddingsRequestManager getEmbeddingsRequestManager(
        IbmWatsonxEmbeddingsModel model,
        Truncator truncator,
        ThreadPool threadPool
    ) {
        return new IbmWatsonxEmbeddingsRequestManager(model, truncator, threadPool);
    }
+
+    /**
+     * Builds an error message for Ibm Watsonx actions.


Suggested change

* Builds an error message for Ibm Watsonx actions.

* Builds an error message for IBM Watsonx actions.

Tnx. Applied as suggested

Jan-Kazlouski-elastic · 2025-06-10T09:56:43Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionVisitor.java

 import org.elasticsearch.xpack.inference.services.ibmwatsonx.embeddings.IbmWatsonxEmbeddingsModel;
 import org.elasticsearch.xpack.inference.services.ibmwatsonx.rerank.IbmWatsonxRerankModel;

 import java.util.Map;

+/**
+ * Interface for creating {@link ExecutableAction} instances for Watsonx models.


IMHO
From here and further down in logs and javadoc "IBM Watsonx" should be used instead of "Ibm Watsonx". It should be human readable. While class and variable names should stay camel cased.

Updated. Thank you

Jan-Kazlouski-elastic · 2025-06-10T10:02:59Z

...sticsearch/xpack/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionModel.java

+
+    /**
+     * Accepts a visitor to create an executable action. The returned action will not return documents in the response.
+     * @param visitor _


I have seen that underscore was used for some other param descriptions earlier, but it doesn't provide any useful information. I think it should be replaced with proper description.

Thank you, Jan. Done

Jan-Kazlouski-elastic · 2025-06-10T10:06:38Z

.../xpack/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionServiceSettings.java

+    /**
+     * Rate limits are defined at
+     * <a href="https://www.ibm.com/docs/en/watsonx/saas?topic=learning-watson-machine-plans">Watson Machine Learning plans</a>.
+     * For Lite plan, you've 120 requests per minute.


Original wording seems a bit off to me. I'd change rerank one as well.

Suggested change

* For Lite plan, you've 120 requests per minute.

* For the Lite plan, the limit is 120 requests per minute.

Thank you. I looked through and applied the suggestions.

unify naming: IBM Watsonx

use suggested comments

replace a visitor param undescore ( _ ) with a definition

Jan-Kazlouski-elastic · 2025-06-10T13:38:48Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

 public class IbmWatsonxActionCreator implements IbmWatsonxActionVisitor {
    private final Sender sender;
    private final ServiceComponents serviceComponents;

+    static final String COMPLETION_REQUEST_TYPE = "IBM WatsonX completions";


After discussion in dms it was found that platform is called Watsonx not WatsonX. Could you please unify naming. Thank you!

Thanks. Unified the naming as IBM Watsonx

A bit of a nitpick but the platform seems to be called IBM watsonx (from their website) but this change updates it everywhere to IBM Watsonx. Can we keep consistent with IBM's capitalization to avoid confusion?

elasticsearchmachine · 2025-06-10T14:13:17Z

Pinging @elastic/search-experiences-team (Team:Search - Experiences)

elasticsearchmachine · 2025-06-10T14:13:17Z

Pinging @elastic/search-eng (Team:SearchOrg)

elasticsearchmachine · 2025-06-10T14:23:35Z

Pinging @elastic/ml-core (Team:ML)

…hat-completion # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

dan-rubinstein

Great work. I'd like to also manually test this change. Would you be able to provide some information in the PR description about how you manually tested such as some example API calls to help me get started with the testing?

dan-rubinstein · 2025-06-16T15:18:25Z

...nce/src/main/java/org/elasticsearch/xpack/inference/services/ibmwatsonx/IbmWatsonxModel.java

+
+    @Override
+    public int rateLimitGroupingHash() {
+        return Objects.hash(uri);


Why does this not need to include the rateLimitServiceSettings?

dan-rubinstein · 2025-06-16T15:20:34Z

...nce/src/main/java/org/elasticsearch/xpack/inference/services/ibmwatsonx/IbmWatsonxModel.java


    private final IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings;

+    protected URI uri;


Is URI only going to be used for completion/chat completion use cases? If yes, can it be in the completion model implementation instead?

dan-rubinstein · 2025-06-16T15:23:48Z

...nce/src/main/java/org/elasticsearch/xpack/inference/services/ibmwatsonx/IbmWatsonxModel.java

 import java.util.Map;
 import java.util.Objects;

-public abstract class IbmWatsonxModel extends Model {
+public abstract class IbmWatsonxModel extends RateLimitGroupingModel {


Can you clarify why this needs to be a RateLimitGroupingModel?

dan-rubinstein · 2025-06-16T15:34:29Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

 public class IbmWatsonxActionCreator implements IbmWatsonxActionVisitor {
    private final Sender sender;
    private final ServiceComponents serviceComponents;

+    static final String COMPLETION_REQUEST_TYPE = "IBM WatsonX completions";


A bit of a nitpick but the platform seems to be called IBM watsonx (from their website) but this change updates it everywhere to IBM Watsonx. Can we keep consistent with IBM's capitalization to avoid confusion?

dan-rubinstein · 2025-06-16T15:40:13Z

...ain/java/org/elasticsearch/xpack/inference/services/voyageai/rerank/VoyageAIRerankModel.java

@@ -109,8 +109,8 @@ public DefaultSecretSettings getSecretSettings() {

    /**
     * Accepts a visitor to create an executable action. The returned action will not return documents in the response.
-     * @param visitor _
-     * @param taskSettings _
+     * @param visitor          Interface for creating {@link ExecutableAction} instances for IBM Voyage AI models.


I believe this should just be Voyage AI instead of IBM Voyage AI.

dan-rubinstein · 2025-06-16T18:37:42Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+    private static final String API_COMPLETIONS_PATH = "https://abc.com/ml/v1/text/chat?version=apiVersion";
+
+    public void testCreateRequest_WithStreaming() throws IOException, URISyntaxException {
+        var request = createRequest("secret", randomAlphaOfLength(15), "model", true);


Can we use randomized strings when possible? (ex. "secret", "model", etc)

dan-rubinstein · 2025-06-16T18:37:54Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+    private static final String AUTH_HEADER_VALUE = "foo";
+    private static final String API_COMPLETIONS_PATH = "https://abc.com/ml/v1/text/chat?version=apiVersion";
+
+    public void testCreateRequest_WithStreaming() throws IOException, URISyntaxException {


Should there be a test for creating a request without streaming?

dan-rubinstein · 2025-06-16T18:40:16Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+        return new IbmWatsonxChatCompletionWithoutAuthRequest(new UnifiedChatInput(List.of(input), "user", stream), chatCompletionModel);
+    }
+
+    private static class IbmWatsonxChatCompletionWithoutAuthRequest extends IbmWatsonxChatCompletionRequest {


Can you clarify why we need to create a WithoutAuth version of the request?

dan-rubinstein · 2025-06-16T18:43:50Z

...k/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionServiceSettingsTests.java

+import static org.elasticsearch.xpack.inference.MatchersUtils.equalToIgnoringWhitespaceInJsonString;
+import static org.hamcrest.Matchers.is;
+
+public class IbmWatsonxChatCompletionServiceSettingsTests extends AbstractWireSerializingTestCase<IbmWatsonxChatCompletionServiceSettings> {


Can we add tests for fromMap for the non-happy cases (ex. modeld missing, projectId missing, URL missing, etc.)? Same goes for cases where optional values aren't set (ex.falling back to default rate limit settings when none are provided)?

dan-rubinstein · 2025-06-16T18:57:07Z

...icsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxChatCompletionActionTests.java

+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.mock;
+
+public class IbmWatsonxChatCompletionActionTests extends ESTestCase {


Seems like this test class is either identical or almost identical to some of our other ...ChatCompletionActionTests classes with the exception of the createAction function and the responsJson we define (example). To reduce duplication can we create a base class ChatCompletionActionTests extends ESTestCase with all the shared code (ex. the tests) and just have each service have a IbmWatsonxChatCompletionActionTests extends ChatCompletionActionTests which overrides a createAction, createResponseJson, etc. set of functions?

Add Ibm Granite Completion and Chat Completion support

78ab1da

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 9, 2025

Evgenii-Kazannik mentioned this pull request Jun 10, 2025

Update Inference specification for Watsonx's completion and chat comp… elastic/elasticsearch-specification#4505

Merged

AI-IshanBhatt added the :SearchOrg/Experiences Label for the Search Experiences team label Jun 10, 2025

Jan-Kazlouski-elastic reviewed Jun 10, 2025

View reviewed changes

Apply suggestions

f92f348

Samiul-TheSoccerFan added Team:ML Meta label for the ML team and removed :SearchOrg/Experiences Label for the Search Experiences team labels Jun 10, 2025

elasticsearchmachine removed the Team:ML Meta label for the ML team label Jun 10, 2025

Samiul-TheSoccerFan added :ml Machine learning Team:ML Meta label for the ML team and removed needs:triage Requires assignment of a team area label labels Jun 10, 2025

Samiul-TheSoccerFan added the >enhancement label Jun 10, 2025

Merge branch 'main' into Add-IBM-Granite-support-for-completion-and-c…

510e3c5

…hat-completion # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

dan-rubinstein reviewed Jun 16, 2025

View reviewed changes

	return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);
	return format("Failed to send IBM Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

	* Builds an error message for Ibm Watsonx actions.
	* Builds an error message for IBM Watsonx actions.

	* For Lite plan, you've 120 requests per minute.
	* For the Lite plan, the limit is 120 requests per minute.


		private final IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings;

		protected URI uri;

Add Ibm Granite Completion and Chat Completion support #129146

Are you sure you want to change the base?

Add Ibm Granite Completion and Chat Completion support #129146

Conversation

Evgenii-Kazannik commented Jun 9, 2025

Uh oh!

Jan-Kazlouski-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jun 10, 2025

Uh oh!

elasticsearchmachine commented Jun 10, 2025

Uh oh!

elasticsearchmachine commented Jun 10, 2025

Uh oh!

dan-rubinstein left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 10, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 10, 2025 •

edited

Loading