Skip to content

Conversation

@rithin-pullela-aws
Copy link
Contributor

Description

Adds extract JSON processor so when a model produces extra text around the generated DSL, it can be ignored.

Example:
if model produces:

"Here is your query: {\"query\":{\"match\":{\"title\":\"test\"}}}"

The actual JSON will be extracted:

{\"query\":{\"match\":{\"title\":\"test\"}}}

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rithin-pullela-aws rithin-pullela-aws requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 07:20 — with GitHub Actions Waiting
@rithin-pullela-aws rithin-pullela-aws requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 07:20 — with GitHub Actions Waiting
@rithin-pullela-aws rithin-pullela-aws requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 07:20 — with GitHub Actions Waiting
@rithin-pullela-aws rithin-pullela-aws requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 07:20 — with GitHub Actions Waiting
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 28, 2025 07:24 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 28, 2025 07:24 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 28, 2025 07:24 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 28, 2025 07:24 — with GitHub Actions Inactive
@pyek-bot
Copy link
Collaborator

Will this work if user wants to add another output parser on top of this default parser? Ideally you would register output parser during tool registration. I get the idea that we need to parse it here to return raw tool response. Just make sure that if other parsers are used it can work with this.

@rithin-pullela-aws
Copy link
Contributor Author

Will this work if user wants to add another output parser on top of this default parser? Ideally you would register output parser during tool registration. I get the idea that we need to parse it here to return raw tool response. Just make sure that if other parsers are used it can work with this.

@pyek-bot yes, users can add another parser on top of the default one.

// Create the default extract_json processor config
Map<String, Object> extractJsonConfig = new HashMap<>();
extractJsonConfig.put("type", "extract_json");
extractJsonConfig.put("extract_type", "object"); // Extract JSON objects only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will silently fall into default search template if json not found. how do you plan to notify this change? is this the expected behaviour? does it make it better controlling by a flag like ignoreParsingFailure?
at least adding debug logging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing, if you want to further validate if this is a valid query, use opensearch query parsers. If it's json and not valid query, you can still fall back to default search template. IMO, this might be better validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mingshi, these are valid points. We are planning to work on this in the next versions of Agentic Search: #4359

The current behavior causes a random error when the query is run against OpenSearch Index without any clear explanation to the end user. And it has been established in 3.3 release (code ref) that when the LLM does not have enough context or fails to produce an output we return the default match all query. Continuing the same behavior (while not ideal) we are returning the default match all query when the JSON is malformed.

We will need to discuss further in the next release how to change this behavior. I would like to keep the behavior same across 3.3 minor versions. And we can introduce a change in 3.4

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create issue to track the enhancement, many of these offline discussions are going away while we just merging quick fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attached the created issue in the above message

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval October 28, 2025 19:56 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval October 28, 2025 19:56 — with GitHub Actions Failure
@rithin-pullela-aws
Copy link
Contributor Author

The failing tests are not related to the change I made:
These are probably flaky since they did not fail on other runs

MLModelGroupRestIT > test_search_MatchAllQuery_For_ModelGroups FAILED
    java.lang.AssertionError: expected null, but was:<org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200], URI [/_plugins/_ml/model_groups/_search], status line [HTTP/2.0 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Fail to search"}],"type":"status_exception","reason":"Fail to search"},"status":500}>
        at __randomizedtesting.SeedInfo.seed([DF82DC920B47ACA4:A2154E7F85406A33]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotNull(Assert.java:756)
        at org.junit.Assert.assertNull(Assert.java:738)
        at org.junit.Assert.assertNull(Assert.java:748)
        at org.opensearch.ml.rest.MLModelGroupRestIT.test_search_MatchAllQuery_For_ModelGroups(MLModelGroupRestIT.java:1186)

pyek-bot
pyek-bot previously approved these changes Oct 28, 2025
dhrubo-os
dhrubo-os previously approved these changes Oct 28, 2025
@dhrubo-os dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 23:33 — with GitHub Actions Waiting
@dhrubo-os dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 23:33 — with GitHub Actions Waiting
@dhrubo-os dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 23:33 — with GitHub Actions Waiting
@dhrubo-os dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval October 28, 2025 23:33 — with GitHub Actions Waiting
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
@rithin-pullela-aws rithin-pullela-aws dismissed stale reviews from dhrubo-os and pyek-bot via 7a368c9 October 28, 2025 23:59
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval October 29, 2025 00:01 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 29, 2025 00:01 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval October 29, 2025 00:01 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 29, 2025 00:01 — with GitHub Actions Inactive
@rithin-pullela-aws
Copy link
Contributor Author

Raised a documentation PR describing the default behavior: opensearch-project/documentation-website#11446

@pyek-bot
Copy link
Collaborator

Thanks for raising the doc PR and mentioning the default value!

@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval October 29, 2025 01:09 — with GitHub Actions Inactive
@rithin-pullela-aws
Copy link
Contributor Author

Failing because of unrelated changes:

MLModelAutoReDeployerIT > testModelAutoRedeploy FAILED
REPRODUCE WITH: ./gradlew ':opensearch-ml-plugin:integTest' --tests 'org.opensearch.ml.autoredeploy.MLModelAutoReDeployerIT.testModelAutoRedeploy' -Dtests.seed=B374BA3F198CB8FE -Dtests.security.manager=false -Dtests.locale=pcm-Latn-NG -Dtests.timezone=America/Asuncion -Druntime.java=21
    java.lang.AssertionError: method [POST], host [http://127.0.0.1:42527/], URI [/_plugins/_ml/models/null/_deploy], status line [HTTP/1.1 404 Not Found]


Suite: Test class org.opensearch.ml.autoredeploy.MLModelAutoReDeployerIT
    {"error":{"root_cause":[{"type":"status_exception","reason":"Failed to find model"}],"type":"status_exception","reason":"Failed to find model"},"status":404}
  2> Ọkt 28, 2025 9:34:19 FỌ ÍVNIN org.opensearch.client.RestClient logResponse
        at __randomizedtesting.SeedInfo.seed([B374BA3F198CB8FE:ADF94E4B10A2EF02]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.ml.autoredeploy.MLModelAutoReDeployerIT.lambda$prepareModel$0(MLModelAutoReDeployerIT.java:52)
        at org.opensearch.ml.rest.MLCommonsRestTestCase.verifyResponse(MLCommonsRestTestCase.java:686)
        at org.opensearch.ml.rest.MLCommonsRestTestCase.getTask(MLCommonsRestTestCase.java:650)

@pyek-bot pyek-bot merged commit d206ffd into opensearch-project:main Oct 29, 2025
16 of 20 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 29, 2025
mingshl pushed a commit that referenced this pull request Oct 29, 2025
) (#4363)

(cherry picked from commit d206ffd)

Co-authored-by: Rithin Pullela <rithinp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants