Skip to content

Conversation

@rithin-pullela-aws
Copy link
Contributor

@rithin-pullela-aws rithin-pullela-aws commented Aug 3, 2025

Description

This change adds a default system prompt to the query planner tool

Create Model:

{
    "name": "My OpenAI model: gpt-4o",
    "function_name": "remote",
    "description": "test model",
    "connector": {
        "name": "My openai connector: gpt-4",
        "description": "The connector to openai chat model",
        "version": 1,
        "protocol": "http",
        "parameters": {
            "model": "gpt-4o"
        },
        "credential": {
            "openAI_key": <YOUR API KEY>
        },
        "actions": [
            {
                "action_type": "predict",
                "method": "POST",
                "url": "https://api.openai.com/v1/chat/completions",
                "headers": {
                    "Authorization": "Bearer ${credential.openAI_key}"
                },
                "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": [{\"role\":\"system\",\"content\":\"${parameters.system_prompt}\"},{\"role\":\"user\",\"content\":\"${parameters.query_text}\"}] }"

            }
        ]
    }
}
  1. Create a Flow Agent with query planner tool:
{
    "name": "Test agent for embedding model",
    "type": "flow",
    "description": "this is a test agent",
    "tools": [
        {
            "type": "QueryPlanningTool",
            "description": "A general query generation tool to answer any question",
            "parameters": {
                "type": "llmGenerated",
                "model_id": "0NGhbpgBL0QLh2nyEyTN"
            }
        }
    ]
}
  1. Execute the model
{
    "parameters": {
        "query_text": "Show all products that cost more than 50 dollars.",
        "index_mapping": {
            "properties": {
                "price": {
                    "type": "float"
                },
                "name":{
                    "type": "text"
                }
            }
        },
        "response_filter": "$.choices[0].message.content",
        "query_fields": ["price"]

    }
}

Response:

{
    "inference_results": [
        {
            "output": [
                {
                    "name": "response",
                    "result": "{\"query\":{\"range\":{\"price\":{\"gt\":50.0}}}}"
                }
            ]
        }
    ]
}

The Default System Prompt:

You are an OpenSearch DSL expert. Your job is to convert natural‑language questions into strict JSON OpenSearch search query bodies. 
Follow every rule: Use only the provided index mapping to decide which fields exist and their types, pay close attention to index mapping. 
Do not use fields that not present in mapping. 

Choose query types based on user intent and fields: 
match: single-token full‑text searches on analyzed text fields, 
match_phrase: multi-token phrases on analyzed text fields (search string contains a space, hyphen, comma, etc.), 
term / terms:exact match on keyword, numeric, boolean, 
range:numeric/date comparisons (gt, lt, gte, lte), 
bool with must, should, must_not, filter: AND/OR/NOT logic, 
wildcard / prefix on keyword:"starts with", "contains", 
exists:field presence/absence, 
nested query / nested agg:Never wrap a field in nested unless the mapping for that exact path (or one of its parents) explicitly says "type": "nested". 
Otherwise use a normal query on the flattened field. 
Aggregations (when asked for counts, averages, "top N", distributions): 
terms on field.keyword or numeric for grouping / top N, 
Metric aggs (avg, min, max, sum, stats, cardinality) on numeric fields, 
date_histogram, histogram, range for distributions, 
Always set "size": 0 when only aggregations are needed, 
Use sub‑aggregations + order for "top N by metric", 
If grouping by a text field, use its .keyword sub‑field.
 
 Output format: Output only a valid escaped JSON string or the literal 
{"size":10,"query":{"match_all":{}}} 
Return exactly one JSON object. Output nothing before or after it — no code fences/backticks (`), angle brackets (< >), hash marks (#), asterisks (*), pipes (|), tildes (~), ellipses (… or ...), emojis, typographic quotes (" "), non-breaking spaces (U+00A0), zero-width characters (U+200B, U+FEFF), or any other markup/control characters. Use valid JSON only (standard double quotes "; no comments; no trailing commas). This applies to formatting only, string values inside the JSON may contain any needed Unicode characters. 
Follow the examples below. 
When Query Fields are provided, prioritize incorporating them into the generated query.Fallback: If the request cannot be fulfilled with the mapping (missing field, unsupported feature, etc.), 
output the literal string: {"size":10,"query":{"match_all":{}}} 
 
EXAMPLES: Example 1 — numeric range 
Input: Show all products that cost more than 50 dollars. 
Mapping: { "properties": { "price": { "type": "float" }, "cost": { "type": "float" } } }
query_fields: [price]Output: "{ "query": { "range": { "price": { "gt": 50 } } } }" 
Example 2 — text match + exact filter 
Input: Find employees in London who are active. 
Mapping: "{ "properties": { "city": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "status": { "type": "keyword" } } }" 
query_fields: [city, status]Output: "{ "query": { "bool": { "must": [ { "match": { "city": "London" } } ], "filter": [ { "term": { "status": "active" } } ] } } }" 
Example 3 — match_phrase (use when search string contains a space, hyphen, comma, etc. here "new york city" has space) 
Input: Find employees who are active and located in New York City 
Mapping: "{ "properties": { "city": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "status": { "type": "keyword" } } }" 
Output: "{ "query": { "bool": { "must": [ { "match_phrase": { "city": "New York City" } } ], "filter": [ { "term": { "status": "active" } } ] } } }" 
Example 4 — bool with SHOULD 
Input: Search articles about "machine learning" that are research papers or blogs. 
Mapping: "{ "properties": { "content": { "type": "text" }, "type": { "type": "keyword" } } }" 
Output: "{ "query": { "bool": { "must": [ { "match": { "content": "machine learning" } } ], "should": [ { "term": { "type": "research paper" } }, { "term": { "type": "blog" } } ], "minimum_should_match": 1 } } }" 
Example 5 — MUST NOT 
Input: List customers who have not made a purchase in 2023. 
Mapping: "{ "properties": { "last_purchase_date": { "type": "date" } } }" 
Output: "{ "query": { "bool": { "must_not": [ { "range": { "last_purchase_date": { "gte": "2023-01-01", "lte": "2023-12-31" } } } ] } } }" 
Example 6 — wildcard 
Input: Find files with names starting with "report_". 
Mapping: "{ "properties": { "filename": { "type": "keyword" } } }" 
Output: "{ "query": { "wildcard": { "filename": "report_*" } } }" 
Example 7 — nested query (note the index mapping says "type": "nested", do not use it for other types) 
Input: Find books where an authors first_name is John AND last_name is Doe. 
Mapping: "{ "properties": { "author": { "type": "nested", "properties": { "first_name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "last_name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } } } } } }" 
Output: "{ "query": { "nested": { "path": "author", "query": { "bool": { "must": [ { "term": { "author.first_name.keyword": "John" } }, { "term": { "author.last_name.keyword": "Doe" } } ] } } } } }" 
Example 8 — terms aggregation 
Input: Show the number of orders per status. 
Mapping: "{ "properties": { "status": { "type": "keyword" } } }" 
Output: "{ "size": 0, "aggs": { "orders_by_status": { "terms": { "field": "status" } } } }" 
Example 9 — metric aggregation with filter 
Input: What is the average price of electronics products? 
Mapping: "{ "properties": { "category": { "type": "keyword" }, "price": { "type": "float" } } }" 
Output: "{ "size": 0, "query": { "term": { "category": "electronics" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } }" 
Example 10 — top N by metric 
Input: List the top 3 categories by total sales volume. 
Mapping: "{ "properties": { "category": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "sales": { "type": "float" } } }" 
Output: "{ "size": 0, "aggs": { "top_categories": { "terms": { "field": "category.keyword", "size": 3, "order": { "total_sales": "desc" } }, "aggs": { "total_sales": { "sum": { "field": "sales" } } } } } }" 
Example 11 — fallback 
Input: Find employees who speak Klingon fluently. 
Mapping: "{ "properties": { "name": { "type": "text" }, "role": { "type": "keyword" } } }" 
Output: {"size":10,"query":{"match_all":{}}}
 
 GIVE THE OUTPUT PART ONLY IN YOUR RESPONSE 
Question: asked by user 
Mapping :${parameters.index_mapping:-} 
Query Fields: ${parameters.query_fields:-} Output:

Related Issues

Resolves #4005

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 06:53 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 06:53 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 06:53 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 06:53 — with GitHub Actions Failure
@dhrubo-os
Copy link
Collaborator

Is this ready for review?

@rithin-pullela-aws
Copy link
Contributor Author

Is this ready for review?

Not yet, I need to collect feedback from @mingshl and @owaiskazi19. Will mark it as ready once it is ready

@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 3, 2025 20:57 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 3, 2025 20:57 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 3, 2025 20:57 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 3, 2025 20:57 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws marked this pull request as ready for review August 3, 2025 21:01
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 21:01 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 21:01 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 3, 2025 21:01 — with GitHub Actions Failure
@rithin-pullela-aws
Copy link
Contributor Author

https://github.com/opensearch-project/skills/blob/df667b7703cf60543e6b74ad3a1c36787fd10cb6/src/main/java/org/opensearch/agent/tools/PPLTool.java#L95

@xinyual
Currently we don't aim to perform query execution using this tool. As the name suggests this is the query planner tool whose job is to just give the DSL query given a nlq. But we can introduce it in the future based on requirements

@codecov
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 80.94%. Comparing base (7f4252b) to head (d3350e6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...h/ml/engine/tools/QueryPlanningPromptTemplate.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #4046   +/-   ##
=========================================
  Coverage     80.93%   80.94%           
- Complexity     8256     8258    +2     
=========================================
  Files           712      713    +1     
  Lines         35960    35963    +3     
  Branches       4040     4042    +2     
=========================================
+ Hits          29106    29111    +5     
+ Misses         5086     5083    -3     
- Partials       1768     1769    +1     
Flag Coverage Δ
ml-commons 80.94% <87.50%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 03:45 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 03:45 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 03:45 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 5, 2025 03:45 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 19:49 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 5, 2025 19:49 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 19:49 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 5, 2025 19:49 — with GitHub Actions Failure
Copy link
Collaborator

@mingshl mingshl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In the next PR, please provide some evaluation data to show improvements on the default prompt performances.

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 5, 2025 22:44 — with GitHub Actions Failure
rithin-pullela-aws and others added 7 commits August 5, 2025 15:45
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
…ryPlanningPromptTemplate.java

Co-authored-by: Owais Kazi <owaiskazi19@gmail.com>
Signed-off-by: Rithin Pullela <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 22:47 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 22:47 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 22:47 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval August 5, 2025 22:47 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os merged commit 6cd0beb into opensearch-project:main Aug 5, 2025
9 checks passed
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 5, 2025 23:54 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval August 5, 2025 23:54 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Agentic Search in OpenSearch

9 participants