Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto deployment for remote models #2206

Merged
merged 4 commits into from
Mar 19, 2024
Merged

Conversation

Zhangxunmt
Copy link
Collaborator

@Zhangxunmt Zhangxunmt commented Mar 14, 2024

Description

This PR is the minimum requirement for automatic deployment. When running a prediction for a remote model,

  1. If remote model is not deployed, deploy to the cluster. It takes time to deploy to all nodes based on the deployment plan.
  2. If deployment has not reached the local node, deploy to the local node without waiting.
  3. worker nodes are concurrent hash map in the ModelCache, so it's thread safe.
  4. Once the deployment is done, either from local deploy or cluster deploy, kick off the prediction.
  5. TTL for undeployment and UTs will be in a separate PR.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 14.89362% with 80 lines in your changes are missing coverage. Please review.

Project coverage is 81.64%. Comparing base (189f2a2) to head (bb102c4).
Report is 1 commits behind head on main.

❗ Current head bb102c4 differs from pull request most recent head 2fecabd. Consider uploading reports for the commit 2fecabd to get more accurate results

Files Patch % Lines
...tion/prediction/TransportPredictionTaskAction.java 7.27% 50 Missing and 1 partial ⚠️
...n/java/org/opensearch/ml/model/MLModelManager.java 0.00% 23 Missing ⚠️
...h/ml/action/deploy/TransportDeployModelAction.java 62.50% 2 Missing and 1 partial ⚠️
...va/org/opensearch/ml/model/MLModelCacheHelper.java 25.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2206      +/-   ##
============================================
- Coverage     81.90%   81.64%   -0.26%     
- Complexity     5719     5720       +1     
============================================
  Files           547      547              
  Lines         23075    23148      +73     
  Branches       2378     2382       +4     
============================================
  Hits          18900    18900              
- Misses         3230     3303      +73     
  Partials        945      945              
Flag Coverage Δ
ml-commons 81.64% <14.89%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 14, 2024 23:22 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env March 14, 2024 23:22 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env March 18, 2024 21:06 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 18, 2024 21:06 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 18, 2024 21:06 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 18, 2024 21:06 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 18, 2024 21:34 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 18, 2024 21:34 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env March 18, 2024 21:34 — with GitHub Actions Inactive
@@ -32,10 +32,13 @@
@Log4j2
public class MLModelCacheHelper {
private final Map<String, MLModelCache> modelCaches;

private final Set<String> localDeployedModels;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refresh in the next commit.

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Signed-off-by: Xun Zhang <xunzh@amazon.com>
Signed-off-by: Xun Zhang <xunzh@amazon.com>
Signed-off-by: Xun Zhang <xunzh@amazon.com>
@Zhangxunmt Zhangxunmt merged commit b037032 into opensearch-project:main Mar 19, 2024
4 of 10 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 19, 2024
* auto deployment for remote models

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* add auto deploy feature flag

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* add eligible node check and avoid over-deployment

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* dispatch local deploy

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
(cherry picked from commit b037032)
Zhangxunmt added a commit that referenced this pull request Mar 19, 2024
* auto deployment for remote models

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* add auto deploy feature flag

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* add eligible node check and avoid over-deployment

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* dispatch local deploy

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
(cherry picked from commit b037032)

Co-authored-by: Xun Zhang <xunzh@amazon.com>
Zhangxunmt added a commit to Zhangxunmt/ml-commons that referenced this pull request Mar 21, 2024
* auto deployment for remote models

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* add auto deploy feature flag

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* add eligible node check and avoid over-deployment

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* dispatch local deploy

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants