Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable customize http client parameters #1558

Conversation

zane-neo
Copy link
Collaborator

@zane-neo zane-neo commented Oct 27, 2023

Description

Currently remote inference using CloseableHttpClient with default configurations both for AWS protocol and http protocol, the drawback is customer can't configure the client base on their own use case. Sometimes issue can be found thrown from httpclient, this feature enables customer to configure their own parameters for httpclient.

Issues Resolved

#1470
#1537

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env October 27, 2023 02:22 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env October 27, 2023 02:22 — with GitHub Actions Inactive
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env October 27, 2023 02:22 — with GitHub Actions Failure
@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (693a53c) 80.28% compared to head (347fbfc) 80.17%.
Report is 17 commits behind head on 2.x.

❗ Current head 347fbfc differs from pull request most recent head b5f2ed4. Consider uploading reports for the commit b5f2ed4 to get more accurate results

Files Patch % Lines
...engine/algorithms/remote/AwsConnectorExecutor.java 0.00% 13 Missing ⚠️
...ine/algorithms/remote/RemoteConnectorExecutor.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                2.x    #1558      +/-   ##
============================================
- Coverage     80.28%   80.17%   -0.12%     
- Complexity     2526     2531       +5     
============================================
  Files           199      200       +1     
  Lines         10001    10056      +55     
  Branches       1002     1004       +2     
============================================
+ Hits           8029     8062      +33     
- Misses         1499     1522      +23     
+ Partials        473      472       -1     
Flag Coverage Δ
ml-commons 80.17% <70.68%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: zane-neo <zaniu@amazon.com>
@@ -177,4 +177,25 @@ private MLCommonsSettings() {}
// Feature flag for enabling search processors for Retrieval Augmented Generation using OpenSearch and Remote Inference.
public static final Setting<Boolean> ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED =
GenerativeQAProcessorConstants.RAG_PIPELINE_FEATURE_ENABLED;

public static final Setting<Integer> ML_COMMONS_HTTP_CLIENT_CONNECTION_TIMEOUT_IN_MILLI_SECOND = Setting
Copy link
Collaborator

@dhrubo-os dhrubo-os Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if customer wants timeout = 60 ms for remote model A, and timeout = 80 ms for remote model B?

@austintlee do you think this could be a valid use case?

May be customer can provide these parameters during setting up the connector? We'll have some default values, but customer can provide their expected values also if they want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60ms and 80ms may not a perfect example but I think this is a valid use case, different LLMs have different latency performance, I agree that we can add these httpclient configurations to connector. @austintlee please share your thoughts as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not include these three params in the request body of the "Predict" call to each LLM? If they are absent in the request, we can use the default values like these settings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be implemented but we'd better make these changes in a different PR, for this PR the target is to enable customer to set these parameters to avoid exceptions.

Copy link
Collaborator

@ylwu-amzn ylwu-amzn Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zane-neo So we have three options now

  1. Add a cluster setting like what this PR does
  2. Add these parameters to connector configuration, from @dhrubo-os
  3. User can set these parameters in request body, from @Zhangxunmt

For my understanding, we already have a lot of cluster settings, maybe let's not add more cluster settings? Seems adding to connector level is more flexible, and if we add to connector parameters, we can also support option3 by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this.

Copy link
Collaborator Author

@zane-neo zane-neo Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, this configuration is only working on remote inference case, we can put these in the connector configuration. The only drawback is user has to configure this for each connector which is kind of verbose.

@@ -177,4 +177,25 @@ private MLCommonsSettings() {}
// Feature flag for enabling search processors for Retrieval Augmented Generation using OpenSearch and Remote Inference.
public static final Setting<Boolean> ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED =
GenerativeQAProcessorConstants.RAG_PIPELINE_FEATURE_ENABLED;

public static final Setting<Integer> ML_COMMONS_HTTP_CLIENT_CONNECTION_TIMEOUT_IN_MILLI_SECOND = Setting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not include these three params in the request body of the "Predict" call to each LLM? If they are absent in the request, we can use the default values like these settings.

Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants