-
Notifications
You must be signed in to change notification settings - Fork 186
[Agentic Search]Add extract JSON processor in Query Planning Tool #4356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agentic Search]Add extract JSON processor in Query Planning Tool #4356
Conversation
302cf72 to
19eeb7b
Compare
|
Will this work if user wants to add another output parser on top of this default parser? Ideally you would register output parser during tool registration. I get the idea that we need to parse it here to return raw tool response. Just make sure that if other parsers are used it can work with this. |
@pyek-bot yes, users can add another parser on top of the default one. |
| // Create the default extract_json processor config | ||
| Map<String, Object> extractJsonConfig = new HashMap<>(); | ||
| extractJsonConfig.put("type", "extract_json"); | ||
| extractJsonConfig.put("extract_type", "object"); // Extract JSON objects only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will silently fall into default search template if json not found. how do you plan to notify this change? is this the expected behaviour? does it make it better controlling by a flag like ignoreParsingFailure?
at least adding debug logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another thing, if you want to further validate if this is a valid query, use opensearch query parsers. If it's json and not valid query, you can still fall back to default search template. IMO, this might be better validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mingshi, these are valid points. We are planning to work on this in the next versions of Agentic Search: #4359
The current behavior causes a random error when the query is run against OpenSearch Index without any clear explanation to the end user. And it has been established in 3.3 release (code ref) that when the LLM does not have enough context or fails to produce an output we return the default match all query. Continuing the same behavior (while not ideal) we are returning the default match all query when the JSON is malformed.
We will need to discuss further in the next release how to change this behavior. I would like to keep the behavior same across 3.3 minor versions. And we can introduce a change in 3.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create issue to track the enhancement, many of these offline discussions are going away while we just merging quick fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attached the created issue in the above message
1909f8b to
80123b3
Compare
|
The failing tests are not related to the change I made: |
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
7a368c9
3860456 to
7a368c9
Compare
|
Raised a documentation PR describing the default behavior: opensearch-project/documentation-website#11446 |
|
Thanks for raising the doc PR and mentioning the default value! |
|
Failing because of unrelated changes: |
) (cherry picked from commit d206ffd)
Description
Adds extract JSON processor so when a model produces extra text around the generated DSL, it can be ignored.
Example:
if model produces:
The actual JSON will be extracted:
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.