-
Notifications
You must be signed in to change notification settings - Fork 181
Push down filters on nested fields as nested queries #4825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| nestedPaths)); | ||
| } | ||
| if (!nestedPaths.isEmpty()) { | ||
| return nestedQuery(nestedPaths.get(0), scriptQuery, ScoreMode.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have supported Agg script and Sort script now, I think we need to identify whether it's a filter script before wrapping it with nested query. And please add a test on agg script with nested fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Script sort does not make use of
PredicateAnalyzer.QueryExpression, thus does not interleave with the current change:
sql/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Line 499 in c31227c
() -> SortBuilders.scriptSort(scriptExpr.getScript(), sortType).order(order)); - The same is with field sort
sql/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Line 476 in c31227c
() -> SortBuilders.fieldSort(digest.getFieldName()).order(order).missing(missing));
However, they still need nested queries to access nested fields. As a result, the following queries does not work (author is a nested field)
- script sort:
source=opensearch-sql_test_index_cascaded_nested | eval lower_name = lower(author.name) | sort lower_name - field sort:
source=opensearch-sql_test_index_cascaded_nested | sort author.name
Should I fix them in this PR or another one?
For agg script, I added a test case. Yet I doubt I still miss many cases? Originally I thought they were only handled in AggregateAnalyzer, so I could raise another PR for nested fields in aggregations.
|
This PR is stalled because it has been open for 2 weeks with no activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't count in the case that the filter is not pushed down.
source=t | head 10000 | where items.name = "xx"
Above query will return incorrect results.
This issue is similar to the #3696 which I am working on refactor to avoid the correctness issue in non-pushdown case.
|
This PR is stalled because it has been open for 2 weeks with no activity. |
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…scripts Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
...-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative_mutated.yaml
Outdated
Show resolved
Hide resolved
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_mutated.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`:
- Around line 1525-1532: Add a unit test in PredicateAnalyzerTest.java that
constructs a predicate/script expression which references multiple nested fields
under different hierarchies, invokes the PredicateAnalyzer analysis path that
triggers the nestedPaths logic in PredicateAnalyzer, and asserts that an
UnsupportedScriptException is thrown; also assert the exception message contains
the nestedPaths information (or substring like "Accessing multiple nested
fields") to validate the exact error scenario handled by the nestedPaths.size()
> 1 branch. Ensure the test uses the same helper/setup used by other
PredicateAnalyzer tests so it isolates the error case and runs fast.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
🔇 Additional comments (6)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (6)
48-99: LGTM on new imports.The added imports are necessary for the nested query functionality and follow standard conventions.
1209-1218: LGTM on nested query wrapping in SimpleQueryExpression.The logic correctly wraps the builder with a nested query only when a non-empty nested path exists, using appropriate null-safety checks.
1473-1474: LGTM on new fields in ScriptQueryExpression.The
fieldTypesandreferredFieldsfields properly support nested path resolution for script queries.
1516-1537: LGTM on ScriptQueryExpression.builder() nested handling.The implementation correctly:
- Identifies nested paths from referred fields
- Throws a clear exception when multiple nested hierarchies are accessed (unsupported)
- Wraps with a nested query when exactly one nested path exists
- Returns a plain script query otherwise
This aligns with OpenSearch's requirement that nested fields must be accessed via nested queries.
1612-1639: LGTM on NamedFieldExpression changes.The changes properly:
- Add Lombok annotations for cleaner code
- Consistently resolve
nestedPathusingUtils.resolveNestedPathwhere field type information is available- Use
nullas the default fornestedPath(aligning with the commit message intent andStrings.isNullOrEmpty()checks)- Mark
nestedPathas@Nullablefor documentation
1498-1503: This concern is misdirected. IfrowTypewere null, the NPE would occur at line 1490 (rowType.getFieldList()) before reaching lines 1498–1503. Additionally, in the Calcite framework,getRowType()is a fundamental contract method onRelNodethat must never return null. No evidence shows thatanalyzeExpressionreceives null for therowTypeparameter in any actual code path.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| if (nestedPaths.size() > 1) { | ||
| throw new UnsupportedScriptException( | ||
| String.format( | ||
| Locale.ROOT, | ||
| "Accessing multiple nested fields under different hierarchies in script is not" | ||
| + " supported: %s", | ||
| nestedPaths)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for existing tests covering the multi-nested-path error scenario
rg -n "multiple nested" --type java
echo "---"
rg -n "different hierarchies" --type java
echo "---"
# Search for test files related to PredicateAnalyzer
find . -type f -name "*PredicateAnalyzerTest*" -o -name "*PredicateAnalyzer*Test*"
echo "---"
# Search for UnsupportedScriptException usage in test files
rg -n "UnsupportedScriptException" --type java | grep -i testRepository: opensearch-project/sql
Length of output: 1432
🏁 Script executed:
#!/bin/bash
# Check PredicateAnalyzerTest.java for nested path tests
wc -l ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/PredicateAnalyzerTest.java
echo "---"
# Search within PredicateAnalyzerTest.java for nested-related tests
rg -n "nested|UnsupportedScriptException" ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/PredicateAnalyzerTest.java
echo "---"
# Check CalciteWhereCommandIT.java around line 141 for the test
sed -n '130,155p' ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.javaRepository: opensearch-project/sql
Length of output: 1144
🏁 Script executed:
#!/bin/bash
# Check what test methods exist in PredicateAnalyzerTest.java related to script/filter
rg -n "void test|void \w+\(" ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/PredicateAnalyzerTest.java | head -30
echo "---"
# Search for any mentions of script or multiple nested in PredicateAnalyzerTest
rg -n "script|Script" ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/PredicateAnalyzerTest.java | head -20Repository: opensearch-project/sql
Length of output: 3501
Add unit test for multi-nested-path error scenario to PredicateAnalyzerTest.java.
The exception for accessing multiple nested fields under different hierarchies is currently validated only by integration tests. Add a dedicated unit test in PredicateAnalyzerTest.java to isolate and verify this error case, improving test execution speed and maintainability.
🤖 Prompt for AI Agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`
around lines 1525 - 1532, Add a unit test in PredicateAnalyzerTest.java that
constructs a predicate/script expression which references multiple nested fields
under different hierarchies, invokes the PredicateAnalyzer analysis path that
triggers the nestedPaths logic in PredicateAnalyzer, and asserts that an
UnsupportedScriptException is thrown; also assert the exception message contains
the nestedPaths information (or substring like "Accessing multiple nested
fields") to validate the exact error scenario handled by the nestedPaths.size()
> 1 branch. Ensure the test uses the same helper/setup used by other
PredicateAnalyzer tests so it isolates the error case and runs fast.
| .filter(Predicate.not(Strings::isNullOrEmpty)) | ||
| .distinct() | ||
| .collect(Collectors.toUnmodifiableList()); | ||
| if (nestedPaths.size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a very corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, testScriptFilterOnDifferentNestedHierarchyShouldThrow tests this case. It happens when users try to operate on two nested fields of different levels.
* lewSupport basic nested queries when applying filter on a nested filed Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java * Test filter on both nested and root level fields Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Support nested queries for simple queries: range, term, terms, etc Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add integration tests for filtering on nested Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a error case for accessing different levels of nested objects in scripts Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a test for accessing nested filter in filter in aggregation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Chores: remove unnecessary comments Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Update plans Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove unused plans Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Use null instead of empty string as a default to nest path attribute Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit ff82c67) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…ies (#5051) * Push down filters on nested fields as nested queries (#4825) * lewSupport basic nested queries when applying filter on a nested filed Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java * Test filter on both nested and root level fields Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Support nested queries for simple queries: range, term, terms, etc Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add integration tests for filtering on nested Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a error case for accessing different levels of nested objects in scripts Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a test for accessing nested filter in filter in aggregation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Chores: remove unnecessary comments Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Update plans Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove unused plans Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Use null instead of empty string as a default to nest path attribute Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit ff82c67) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Downgrade to java 11 Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
Description
Because nested fields are indexed as hidden documents, we cannot query them directly. Instead, we have to use the nested query to access them.
For example, for the following mapping:
{ "mappings": { "properties": { "id": {"type": "keyword"}, "items": { "type": "nested", "properties": { "name": {"type": "keyword"} } } } } }if we want to use term query to match those items with name
banana, instead of using:{ "query": { "term": { "items.name": { "value": "banana", "boost": 1 } } } }We should use
{ "query": { "nested": { "path": "items", "query": { "term": { "items.name": { "value": "banana", "boost": 1 } } } } } }Here, the
nestedclause steps down into the nesteditemsfield. It no longer has access to fields in the root document, nor fields in any other nested document.Work items
Support filters on both nested and root objects, where they can be separated. E.g.Already implemented.| where items.name = 'banana' and id = 2can be separated to two filters:items.name = 'banana'stays under a nested query, whileid = 2stays on a normal term query, combined with aboolquery.Related Issues
Resolves #4508
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.