Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Jan 26, 2026

Description

Fix the bug discovered in #5054. See root cause description in #5054 (comment)

Related Issues

Resolves #5054

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Boolean comparisons (TRUE/FALSE, NOT, !=) now generate correct term and mustNot filters for pushdown, preserving null/missing semantics and improving aggregation filter behavior.
  • Tests

    • Added unit and integration tests plus expected-plan fixtures covering query_string with true/false, NOT and != pushdown scenarios.
    • Updated YAML integration tests to enable the Calcite plugin during setup/teardown and to use concise length-based validations for sample counts.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Convert boolean field predicates earlier in Calcite traversal and predicate analysis to emit exact boolean term or negated-term queries; add unit and integration tests, REST YAML test, and expected explain-plan YAMLs covering boolean pushdown cases.

Changes

Cohort / File(s) Summary
Predicate analysis & boolean helpers
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Add boolean-type detection on NamedFieldExpression and new QueryExpression boolean methods (isFalse, isNotFalse, isNotTrue); short-circuit boolean NamedFieldExpression into term/must_not term queries and extend postfix operator handling (IS_FALSE, IS_NOT_TRUE, IS_NOT_FALSE).
Calcite boolean rewrites
core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
Introduce private helpers to detect boolean field comparisons and rewrite NOT and !=/<> patterns into IS_NOT_TRUE/IS_NOT_FALSE forms during Rex traversal.
Opensearch unit tests
opensearch/src/test/java/org/opensearch/sql/opensearch/request/PredicateAnalyzerTest.java, opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java
Add boolean field to test schema and tests asserting IS_TRUE generates TermQuery and combinations produce BoolQuery; update aggregate test expectation to use a term filter for boolean true.
Calcite integration tests
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Add six explain tests validating boolean pushdown with query_string, TRUE/'TRUE', false, NOT true, and != true variants, comparing against expected YAML plans.
Calcite expected explain plans
integ-test/src/test/resources/expectedOutput/calcite/...
explain_filter_query_string_with_boolean.yaml, explain_filter_query_string_with_boolean_false.yaml, explain_filter_query_string_with_boolean_not_true.yaml
Add expected logical/physical explain YAMLs showing pushed-down boolean term filters (must / must_not) combined with query_string in PushDownContext/OpenSearchRequestBuilder.
YAML REST integration tests
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5054.yml
Add REST test that creates an index with a boolean field, bulk-inserts documents, toggles Calcite plugin in hooks, and asserts results for is_internal=true/false and NOT variants.
Test expectation tweaks
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4866.yml
Simplify expectations to length-based assertions for datarows and adjust schema/total matches.
Calcite explain IT additions
integ-test/src/test/resources/expectedOutput/calcite/*
Add three new expected explain-plan YAMLs corresponding to the new explainIT tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Planner as CalcitePlanner
    participant Rex as CalciteRexNodeVisitor
    participant Analyzer as PredicateAnalyzer
    participant QExpr as QueryExpression
    participant DSL as OpenSearch DSL

    Client->>Planner: submit SQL with boolean predicate
    Planner->>Rex: translate Rex nodes (compare / NOT)
    Rex->>Planner: rewrite != / NOT -> IS_NOT_* when applicable
    Planner->>Analyzer: analyzeExpression(filter)
    Analyzer->>Analyzer: detect NamedFieldExpression.isBooleanType()
    Analyzer->>QExpr: convert boolean field -> isTrue()/isFalse()/isNotTrue()/isNotFalse()
    QExpr->>DSL: emit TermQuery or must_not TermQuery (combined with query_string)
    DSL-->>Planner: return pushed-down DSL
    Planner-->>Client: explain/execute with pushed-down boolean term
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

calcite

Suggested reviewers

  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • penghuo
  • anirudha
  • GumpacG
  • Swiddis
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly references fixing a bug related to boolean comparison conditions being simplified to fields, which aligns with the core changes adding boolean field pushdown logic and corresponding tests.
Description check ✅ Passed The description is related to the changeset, referencing issue #5054 and explaining that the PR fixes a discovered bug with a link to the root cause analysis.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@penghuo penghuo added bugFix PPL Piped processing language labels Jan 26, 2026
Content-Type: 'application/json'
ppl:
body:
query: source=test-boolean | where is_internal=true | fields name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using failed query source=test url=http | where is_internal=true
in #5054

Comment on lines 582 to 586
// Handle NOT(IS_TRUE(boolean_field)) - convert to term query with false value
// This covers cases where IS_TRUE was explicitly applied
if (expr instanceof SimpleQueryExpression simpleExpr && simpleExpr.isBooleanFieldIsTrue()) {
return QueryExpression.create(simpleExpr.rel).isFalse();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • (NOT boolean_field = true) will return fields include ture, null and missing fields
  • but boolean_field=false only return fields has false value.

// generate a term query with value true.
// When called on an already-evaluated predicate (builder already set),
// return as-is.
if (builder == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to override isTrue and not API for NamedFieldExpression instead of changing SimpleQueryExpression?

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5054.yml`:
- Around line 1-15: The test uses an index named "test" and currently doesn't
clean it up; update the YAML to ensure index isolation by adding explicit delete
steps for the "test" index in both the setup and teardown blocks (or replace
"test" with a generated unique name), e.g., add a do: delete index action before
the test runs and another delete after the test completes so the index cannot
leak state or conflict with other tests; reference the existing setup/teardown
blocks and the index name "test" when making these changes.

Comment on lines +1 to +15
setup:
- do:
query.settings:
body:
transient:
plugins.calcite.enabled: true

---
teardown:
- do:
query.settings:
body:
transient:
plugins.calcite.enabled: false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure index isolation by cleaning up test before/after use.
Right now the test can fail or leak state if an index named test already exists or is reused. Add a cleanup step (or use a unique index name) to keep this test independent.

🧹 Suggested cleanup (align with existing YAML REST test patterns)
 setup:
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled: true
+  - do:
+      indices.delete:
+        index: test
+        ignore: 404

 ---
 teardown:
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled: false
+  - do:
+      indices.delete:
+        index: test
+        ignore: 404

As per coding guidelines: Tests must not rely on execution order; ensure test independence.

Also applies to: 23-34

🤖 Prompt for AI Agents
In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5054.yml`
around lines 1 - 15, The test uses an index named "test" and currently doesn't
clean it up; update the YAML to ensure index isolation by adding explicit delete
steps for the "test" index in both the setup and teardown blocks (or replace
"test" with a generated unique name), e.g., add a do: delete index action before
the test runs and another delete after the test completes so the index cannot
leak state or conflict with other tests; reference the existing setup/teardown
blocks and the index name "test" when making these changes.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
if (operand instanceof NamedFieldExpression namedField && namedField.isBooleanType()) {
return booleanOp.apply(QueryExpression.create(namedField));
}
// IS_TRUE on a predicate (already evaluated QueryExpression) is allowed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will other operation like IS_FALSE, IS_NOT_TRUE, IS_NOT_FALSE apply to QueryExpression?

Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`:
- Around line 603-626: The boolean postfix handling currently calls
QueryExpression.isFalse()/isNotFalse()/isNotTrue() which overwrite any existing
builder; change the branch that handles operand instanceof QueryExpression qe to
avoid calling those mutators and instead preserve/wrap the existing builder: for
IS_TRUE/IS_NOT_FALSE return qe as-is, and for IS_FALSE/IS_NOT_TRUE return a
negated form of qe (implement QueryExpression.negate() or wrap qe.getBuilder()
into a BoolQuery with mustNot) so predicates like (age > 30) IS FALSE are
expressed by negating the existing predicate builder rather than replacing it
with a term query; keep the existing boolean-field handling
(NamedFieldExpression) unchanged.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`:
- Around line 603-625: The boolean operator handling in PredicateAnalyzer
currently calls booleanOp.apply(qe) for any QueryExpression operand, but
CompoundQueryExpression does not implement isTrue/isNotTrue and thus calling
those methods throws; update the branch that handles operand instanceof
QueryExpression to detect CompoundQueryExpression (or other predicate
QueryExpression subclasses) and directly route IS_TRUE and IS_NOT_TRUE to the
predicate-handling path instead of invoking qe.isTrue/ qe.isNotTrue;
specifically, inside the if (operand instanceof QueryExpression qe) block check
if qe is a CompoundQueryExpression (or predicate-type) and for call.getKind() ==
IS_TRUE / IS_NOT_TRUE return the appropriate predicate query (the same output
produced for NamedFieldExpression boolean predicates) or otherwise fall back to
booleanOp.apply(qe) for supported QueryExpression implementations.
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (1)

229-235: Consider recording analyzed nodes for the top-level boolean-field shortcut.

This keeps analyzedNodes consistent with the tryAnalyzeOperand path and improves downstream partial-pushdown bookkeeping.

🔧 Suggested tweak
-      if (result instanceof NamedFieldExpression namedField && namedField.isBooleanType()) {
-        return QueryExpression.create(namedField).isTrue();
-      }
+      if (result instanceof NamedFieldExpression namedField && namedField.isBooleanType()) {
+        QueryExpression qe = QueryExpression.create(namedField).isTrue();
+        qe.updateAnalyzedNodes(expression);
+        return qe;
+      }

Comment on lines +603 to +625
// Handle boolean field operators: IS_TRUE, IS_FALSE, IS_NOT_TRUE, IS_NOT_FALSE
// These generate term queries for exact boolean value matching or mustNot queries
// for negated matching (which includes null/missing documents).
Function<QueryExpression, QueryExpression> booleanOp =
switch (call.getKind()) {
case IS_TRUE -> QueryExpression::isTrue;
case IS_FALSE -> QueryExpression::isFalse;
case IS_NOT_TRUE -> QueryExpression::isNotTrue;
case IS_NOT_FALSE -> QueryExpression::isNotFalse;
default -> null;
};

if (booleanOp != null) {
Expression operand = call.getOperands().get(0).accept(this);
if (operand instanceof NamedFieldExpression namedField && namedField.isBooleanType()) {
return booleanOp.apply(QueryExpression.create(namedField));
}
// Boolean operators on a predicate (already evaluated QueryExpression) are allowed
if (operand instanceof QueryExpression qe) {
return booleanOp.apply(qe);
}
throw new PredicateAnalyzerException(
call.getKind() + " can only be applied to boolean fields or predicates");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle IS_TRUE / IS_NOT_TRUE on compound predicates to avoid unexpected exceptions.

CompoundQueryExpression doesn’t override isTrue/isNotTrue, so booleanOp.apply(qe) will throw and fall back to scripts even though predicates are “allowed” here. Consider routing these two operators directly for predicate operands.

🛠️ Suggested adjustment
       if (booleanOp != null) {
         Expression operand = call.getOperands().get(0).accept(this);
         if (operand instanceof NamedFieldExpression namedField && namedField.isBooleanType()) {
           return booleanOp.apply(QueryExpression.create(namedField));
         }
-        // Boolean operators on a predicate (already evaluated QueryExpression) are allowed
-        if (operand instanceof QueryExpression qe) {
-          return booleanOp.apply(qe);
-        }
+        if (operand instanceof QueryExpression qe) {
+          return switch (call.getKind()) {
+            case IS_TRUE -> qe;
+            case IS_NOT_TRUE -> qe.not();
+            case IS_FALSE, IS_NOT_FALSE ->
+                throw new PredicateAnalyzerException(
+                    call.getKind() + " can only be applied to boolean fields");
+            default -> booleanOp.apply(qe);
+          };
+        }
         throw new PredicateAnalyzerException(
             call.getKind() + " can only be applied to boolean fields or predicates");
       }
🤖 Prompt for AI Agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`
around lines 603 - 625, The boolean operator handling in PredicateAnalyzer
currently calls booleanOp.apply(qe) for any QueryExpression operand, but
CompoundQueryExpression does not implement isTrue/isNotTrue and thus calling
those methods throws; update the branch that handles operand instanceof
QueryExpression to detect CompoundQueryExpression (or other predicate
QueryExpression subclasses) and directly route IS_TRUE and IS_NOT_TRUE to the
predicate-handling path instead of invoking qe.isTrue/ qe.isNotTrue;
specifically, inside the if (operand instanceof QueryExpression qe) block check
if qe is a CompoundQueryExpression (or predicate-type) and for call.getKind() ==
IS_TRUE / IS_NOT_TRUE return the appropriate predicate query (the same output
produced for NamedFieldExpression boolean predicates) or otherwise fall back to
booleanOp.apply(qe) for supported QueryExpression implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] PPL where command does not work as expected.

3 participants