-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable customize http client parameters #1558
enable customize http client parameters #1558
Conversation
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## 2.x #1558 +/- ##
============================================
- Coverage 80.28% 80.17% -0.12%
- Complexity 2526 2531 +5
============================================
Files 199 200 +1
Lines 10001 10056 +55
Branches 1002 1004 +2
============================================
+ Hits 8029 8062 +33
- Misses 1499 1522 +23
+ Partials 473 472 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: zane-neo <zaniu@amazon.com>
@@ -177,4 +177,25 @@ private MLCommonsSettings() {} | |||
// Feature flag for enabling search processors for Retrieval Augmented Generation using OpenSearch and Remote Inference. | |||
public static final Setting<Boolean> ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED = | |||
GenerativeQAProcessorConstants.RAG_PIPELINE_FEATURE_ENABLED; | |||
|
|||
public static final Setting<Integer> ML_COMMONS_HTTP_CLIENT_CONNECTION_TIMEOUT_IN_MILLI_SECOND = Setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if customer wants timeout = 60 ms for remote model A, and timeout = 80 ms for remote model B?
@austintlee do you think this could be a valid use case?
May be customer can provide these parameters during setting up the connector? We'll have some default values, but customer can provide their expected values also if they want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60ms and 80ms may not a perfect example but I think this is a valid use case, different LLMs have different latency performance, I agree that we can add these httpclient configurations to connector. @austintlee please share your thoughts as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not include these three params in the request body of the "Predict" call to each LLM? If they are absent in the request, we can use the default values like these settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be implemented but we'd better make these changes in a different PR, for this PR the target is to enable customer to set these parameters to avoid exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zane-neo So we have three options now
- Add a cluster setting like what this PR does
- Add these parameters to connector configuration, from @dhrubo-os
- User can set these parameters in request body, from @Zhangxunmt
For my understanding, we already have a lot of cluster settings, maybe let's not add more cluster settings? Seems adding to connector level is more flexible, and if we add to connector parameters, we can also support option3 by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, this configuration is only working on remote inference case, we can put these in the connector configuration. The only drawback is user has to configure this for each connector which is kind of verbose.
...thms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java
Show resolved
Hide resolved
...thms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java
Show resolved
Hide resolved
@@ -177,4 +177,25 @@ private MLCommonsSettings() {} | |||
// Feature flag for enabling search processors for Retrieval Augmented Generation using OpenSearch and Remote Inference. | |||
public static final Setting<Boolean> ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED = | |||
GenerativeQAProcessorConstants.RAG_PIPELINE_FEATURE_ENABLED; | |||
|
|||
public static final Setting<Integer> ML_COMMONS_HTTP_CLIENT_CONNECTION_TIMEOUT_IN_MILLI_SECOND = Setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not include these three params in the request body of the "Predict" call to each LLM? If they are absent in the request, we can use the default values like these settings.
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Description
Currently remote inference using CloseableHttpClient with default configurations both for AWS protocol and http protocol, the drawback is customer can't configure the client base on their own use case. Sometimes issue can be found thrown from httpclient, this feature enables customer to configure their own parameters for httpclient.
Issues Resolved
#1470
#1537
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.