Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #2317 predict api not working with asymmetric models #2318

Merged
merged 5 commits into from
Apr 25, 2024

Conversation

br3no
Copy link
Contributor

@br3no br3no commented Apr 12, 2024

Description

When using the _predict API for text embeddings, the instantiation of the MLInput object happens via the MLCommonsClassLoader here:

MLInput mlInput = MLCommonsClassLoader.initMLInput(algorithm, new Object[]{parser, algorithm}, XContentParser.class, FunctionName.class);

The reflection logic ends up calling the constructor of TextDocsMLInput, which does not parse any parameters that might be contained in the XContentParser, though. This causes the error reported in #2317. This PR adds a case to the parsing logic to make sure the parameters are parsed. It replicates the behavior of the MLInput parse method:

Issues Resolved

#2317

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… models

Signed-off-by: br3no <breno@veltefaria.de>
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 11:20 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 11:20 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 11:20 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 11:20 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 11:20 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 11:20 — with GitHub Actions Inactive
@br3no br3no mentioned this pull request Apr 12, 2024
5 tasks
@br3no
Copy link
Contributor Author

br3no commented Apr 12, 2024

It would be nice to add an integration test for this, but I couldn't find one for the text embeddings I could easily extend.

@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 12:11 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 12:11 — with GitHub Actions Inactive
@br3no br3no temporarily deployed to ml-commons-cicd-env April 12, 2024 12:11 — with GitHub Actions Inactive
@dhrubo-os
Copy link
Collaborator

It would be nice to add an integration test for this, but I couldn't find one for the text embeddings I could easily extend.

May be we can get some idea from here?

Signed-off-by: br3no <breno@veltefaria.de>
@br3no
Copy link
Contributor Author

br3no commented Apr 16, 2024

@br3no Did you test the fix in multi-node cluster? If not could you please do that?

@dhrubo-os this was a very good hint.

Testing in the multi-node cluster setup I ran into some very unexpected class not found errors. It turns out all MLAlgoParameters must live in the "org.opensearch.ml.common.input.parameter" package. Cf.

Reflections reflections = new Reflections("org.opensearch.ml.common.input.parameter");

This is very obscure and I can't say why the errors only appeared in a multi-node setting. But now it works.

Let me know if there is anything else open still.

@intrafindBreno intrafindBreno temporarily deployed to ml-commons-cicd-env April 16, 2024 22:20 — with GitHub Actions Inactive
@intrafindBreno intrafindBreno temporarily deployed to ml-commons-cicd-env April 16, 2024 22:20 — with GitHub Actions Inactive
@intrafindBreno intrafindBreno temporarily deployed to ml-commons-cicd-env April 16, 2024 22:20 — with GitHub Actions Inactive
@dhrubo-os
Copy link
Collaborator

@br3no Did you test the fix in multi-node cluster? If not could you please do that?

@dhrubo-os this was a very good hint.

Testing in the multi-node cluster setup I ran into some very unexpected class not found errors. It turns out all MLAlgoParameters must live in the "org.opensearch.ml.common.input.parameter" package. Cf.

Reflections reflections = new Reflections("org.opensearch.ml.common.input.parameter");

This is very obscure and I can't say why the errors only appeared in a multi-node setting. But now it works.

Let me know if there is anything else open still.

Thanks for looking into this. Approved.

@br3no
Copy link
Contributor Author

br3no commented Apr 23, 2024

@ylwu-amzn mind having a look? Only one review required to merge this...

@dhrubo-os, or maybe there is someone else available?

Copy link
Collaborator

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@ylwu-amzn ylwu-amzn merged commit 8425a65 into opensearch-project:main Apr 25, 2024
10 of 13 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 25, 2024
* Fixes #2317 predict api not working with asymmetric models

Signed-off-by: br3no <breno@veltefaria.de>

* Adding unit test code path for the parsing of the parameter.

Signed-off-by: br3no <breno@veltefaria.de>

* Removing involuntary import of guava

Signed-off-by: br3no <breno@veltefaria.de>

* Refactor package of AsymmetricTextEmbeddingParameters

The MLCommonsClassLoader expects all MLAlgoParameters to be in the
"org.opensearch.ml.common.input.parameter" package.

Signed-off-by: br3no <breno@veltefaria.de>

* fixing unit test after package refactoring

Signed-off-by: br3no <breno@veltefaria.de>

---------

Signed-off-by: br3no <breno@veltefaria.de>
(cherry picked from commit 8425a65)
ylwu-amzn pushed a commit that referenced this pull request Apr 25, 2024
…2359)

* Fixes #2317 predict api not working with asymmetric models

Signed-off-by: br3no <breno@veltefaria.de>

* Adding unit test code path for the parsing of the parameter.

Signed-off-by: br3no <breno@veltefaria.de>

* Removing involuntary import of guava

Signed-off-by: br3no <breno@veltefaria.de>

* Refactor package of AsymmetricTextEmbeddingParameters

The MLCommonsClassLoader expects all MLAlgoParameters to be in the
"org.opensearch.ml.common.input.parameter" package.

Signed-off-by: br3no <breno@veltefaria.de>

* fixing unit test after package refactoring

Signed-off-by: br3no <breno@veltefaria.de>

---------

Signed-off-by: br3no <breno@veltefaria.de>
(cherry picked from commit 8425a65)

Co-authored-by: Breno Faria <breno@veltefaria.de>
@mingshl mingshl added the bug Something isn't working label Apr 30, 2024
dhrubo-os pushed a commit to dhrubo-os/ml-commons that referenced this pull request May 17, 2024
… models (opensearch-project#2318) (opensearch-project#2359)

* Fixes opensearch-project#2317 predict api not working with asymmetric models

Signed-off-by: br3no <breno@veltefaria.de>

* Adding unit test code path for the parsing of the parameter.

Signed-off-by: br3no <breno@veltefaria.de>

* Removing involuntary import of guava

Signed-off-by: br3no <breno@veltefaria.de>

* Refactor package of AsymmetricTextEmbeddingParameters

The MLCommonsClassLoader expects all MLAlgoParameters to be in the
"org.opensearch.ml.common.input.parameter" package.

Signed-off-by: br3no <breno@veltefaria.de>

* fixing unit test after package refactoring

Signed-off-by: br3no <breno@veltefaria.de>

---------

Signed-off-by: br3no <breno@veltefaria.de>
(cherry picked from commit 8425a65)

Co-authored-by: Breno Faria <breno@veltefaria.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants