Skip to content

Add a new row count estimation mechanism for CalciteIndexScan #3605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ public enum Key {
CALCITE_ENGINE_ENABLED("plugins.calcite.enabled"),
CALCITE_FALLBACK_ALLOWED("plugins.calcite.fallback.allowed"),
CALCITE_PUSHDOWN_ENABLED("plugins.calcite.pushdown.enabled"),
CALCITE_PUSHDOWN_ROWCOUNT_ESTIMATION_FACTOR(
"plugins.calcite.pushdown.rowcount.estimation.factor"),

/** Query Settings. */
FIELD_TYPE_TOLERANCE("plugins.query.field_type_tolerance"),
Expand Down
36 changes: 36 additions & 0 deletions docs/user/admin/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -757,3 +757,39 @@ This setting is present from 3.0.0-beta. You can enable Calcite as new query opt

Check `introduce v3 engine <../../../dev/intro-v3-engine.md>`_ for more details.
Check `join doc <../../ppl/cmd/join.rst>`_ for example.

plugins.calcite.fallback.allowed
=======================

Description
-----------

This setting is present from 3.0.0-beta. If Calcite is enabled, you can use this setting to decide whether to allow fallback to v2 engine for some queries which are not supported by v3 engine.

1. The default value is true in 3.0.0-beta.
2. This setting is node scope.
3. This setting can be updated dynamically.

plugins.calcite.pushdown.enabled
=======================

Description
-----------

This setting is present from 3.0.0-beta. If Calcite is enabled, you can use this setting to decide whether to enable the operator pushdown optimization for v3 engine.

1. The default value is true in 3.0.0-beta.
2. This setting is node scope.
3. This setting can be updated dynamically.

plugins.calcite.pushdown.rowcount.estimation.factor
=======================

Description
-----------

This setting is present from 3.1.0. If Calcite pushdown optimization is enabled, this setting is used to estimate the row count of the query plan. The value is a factor to multiply the row count of the table scan to get the estimated row count.

1. The default value is 0.9 in 3.1.0.
2. This setting is node scope.
3. This setting can be updated dynamically.
Original file line number Diff line number Diff line change
Expand Up @@ -621,12 +621,12 @@ public void testSumGroupByNullValue() throws IOException {
verifySchema(response, schema("a", null, "long"), schema("age", null, "integer"));
verifyDataRows(
response,
rows(null, null),
rows(isPushdownEnabled() ? 0 : null, null),
rows(32838, 28),
rows(39225, 32),
rows(4180, 33),
rows(48086, 34),
rows(null, 36));
rows(isPushdownEnabled() ? 0 : null, 36));
}

@Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ public void testExplainCommandCost() {
String result = explainQuery("explain cost source=test | where age = 20 | fields name, age");
assertTrue(
result.contains(
"CalciteEnumerableIndexScan(table=[[OpenSearch, test]]): rowcount = 100.0, cumulative"
+ " cost = {100.0 rows, 101.0 cpu, 0.0 io}"));
"CalciteEnumerableIndexScan(table=[[OpenSearch, test]]): rowcount = 10000.0, cumulative"
+ " cost = {10000.0 rows, 10001.0 cpu, 0.0 io}"));
}

@Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@ public void testExplainCommand() {
+ " LogicalFilter(condition=[=($1, 20)])\\n"
+ " CalciteLogicalIndexScan(table=[[OpenSearch, test]])\\n"
+ "\",\n"
+ " \"physical\": \"EnumerableCalc(expr#0..7=[{inputs}], proj#0..1=[{exprs}])\\n"
+ " CalciteEnumerableIndexScan(table=[[OpenSearch, test]],"
+ " PushDownContext=[[FILTER->=($1, 20)],"
+ " OpenSearchRequestBuilder(sourceBuilder={\\\"from\\\":0,\\\"timeout\\\":\\\"1m\\\",\\\"query\\\":{\\\"term\\\":{\\\"age\\\":{\\\"value\\\":20,\\\"boost\\\":1.0}}},\\\"sort\\\":[{\\\"_doc\\\":{\\\"order\\\":\\\"asc\\\"}}]},"
+ " \"physical\": \"CalciteEnumerableIndexScan(table=[[OpenSearch, test]],"
+ " PushDownContext=[[PROJECT->[name, age], FILTER->=($1, 20)],"
+ " OpenSearchRequestBuilder(sourceBuilder={\\\"from\\\":0,\\\"timeout\\\":\\\"1m\\\",\\\"query\\\":{\\\"term\\\":{\\\"age\\\":{\\\"value\\\":20,\\\"boost\\\":1.0}}},\\\"_source\\\":{\\\"includes\\\":[\\\"name\\\",\\\"age\\\"],\\\"excludes\\\":[]},\\\"sort\\\":[{\\\"_doc\\\":{\\\"order\\\":\\\"asc\\\"}}]},"
+ " requestedTotalSize=200, pageSize=null, startFrom=0)])\\n"
+ "\"\n"
+ " }\n"
Expand All @@ -40,12 +39,24 @@ public void testExplainCommand() {
@Test
public void testExplainCommandCost() {
String result = explainQuery("explain cost source=test | where age = 20 | fields name, age");
assertTrue(
result.contains(
"CalciteEnumerableIndexScan(table=[[OpenSearch, test]], PushDownContext=[[FILTER->=($1,"
+ " 20)],"
+ " OpenSearchRequestBuilder(sourceBuilder={\\\"from\\\":0,\\\"timeout\\\":\\\"1m\\\",\\\"query\\\":{\\\"term\\\":{\\\"age\\\":{\\\"value\\\":20,\\\"boost\\\":1.0}}},\\\"sort\\\":[{\\\"_doc\\\":{\\\"order\\\":\\\"asc\\\"}}]},"
+ " requestedTotalSize=200, pageSize=null, startFrom=0)]): rowcount = 100.0,"
+ " cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}"));
String expected =
"{\n"
+ " \"calcite\": {\n"
+ " \"logical\": \"LogicalProject(name=[$0], age=[$1]): rowcount = 1500.0,"
+ " cumulative cost = {13000.0 rows, 23001.0 cpu, 0.0 io}, id = *\\n"
+ " LogicalFilter(condition=[=($1, 20)]): rowcount = 1500.0, cumulative cost ="
+ " {11500.0 rows, 20001.0 cpu, 0.0 io}, id = *\\n"
+ " CalciteLogicalIndexScan(table=[[OpenSearch, test]]): rowcount = 10000.0,"
+ " cumulative cost = {10000.0 rows, 10001.0 cpu, 0.0 io}, id = *\\n"
+ "\",\n"
+ " \"physical\": \"CalciteEnumerableIndexScan(table=[[OpenSearch, test]],"
+ " PushDownContext=[[PROJECT->[name, age], FILTER->=($1, 20)],"
+ " OpenSearchRequestBuilder(sourceBuilder={\\\"from\\\":0,\\\"timeout\\\":\\\"1m\\\",\\\"query\\\":{\\\"term\\\":{\\\"age\\\":{\\\"value\\\":20,\\\"boost\\\":1.0}}},\\\"_source\\\":{\\\"includes\\\":[\\\"name\\\",\\\"age\\\"],\\\"excludes\\\":[]},\\\"sort\\\":[{\\\"_doc\\\":{\\\"order\\\":\\\"asc\\\"}}]},"
+ " requestedTotalSize=200, pageSize=null, startFrom=0)]): rowcount = 1215.0,"
+ " cumulative cost = {1215.0 rows, 1216.0 cpu, 0.0 io}, id = *\\n"
+ "\"\n"
+ " }\n"
+ "}";
assertEquals(expected, result.replaceAll("id = \\d+", "id = *"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ private Settings defaultSettings() {
.put(Key.CALCITE_ENGINE_ENABLED, true)
.put(Key.CALCITE_FALLBACK_ALLOWED, false)
.put(Key.CALCITE_PUSHDOWN_ENABLED, false)
.put(Key.CALCITE_PUSHDOWN_ROWCOUNT_ESTIMATION_FACTOR, 0.9)
.put(Key.DEFAULT_PATTERN_METHOD, "SIMPLE_PATTERN")
.build();

Expand All @@ -139,6 +140,7 @@ protected Settings enablePushdown() {
.put(Key.CALCITE_ENGINE_ENABLED, true)
.put(Key.CALCITE_FALLBACK_ALLOWED, false)
.put(Key.CALCITE_PUSHDOWN_ENABLED, true)
.put(Key.CALCITE_PUSHDOWN_ROWCOUNT_ESTIMATION_FACTOR, 0.9)
.put(Key.DEFAULT_PATTERN_METHOD, "SIMPLE_PATTERN")
.build();

Expand Down
23 changes: 11 additions & 12 deletions integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
package org.opensearch.sql.ppl;

import static org.hamcrest.Matchers.containsString;
import static org.opensearch.sql.util.MatcherUtils.assertJsonEquals;
import static org.opensearch.sql.util.MatcherUtils.assertJsonEqualsIgnoreRelId;

import com.google.common.io.Resources;
import java.io.IOException;
Expand All @@ -31,7 +31,7 @@ public void testExplain() throws Exception {
isCalciteEnabled()
? loadFromFile("expectedOutput/calcite/explain_output.json")
: loadFromFile("expectedOutput/ppl/explain_output.json");
assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -51,7 +51,7 @@ public void testFilterPushDownExplain() throws Exception {
? loadFromFile("expectedOutput/calcite/explain_filter_push.json")
: loadFromFile("expectedOutput/ppl/explain_filter_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -63,13 +63,12 @@ public void testFilterPushDownExplain() throws Exception {

@Test
public void testFilterAndAggPushDownExplain() throws Exception {
// TODO check why the agg pushdown doesn't work in calcite
String expected =
isCalciteEnabled()
? loadFromFile("expectedOutput/calcite/explain_filter_agg_push.json")
: loadFromFile("expectedOutput/ppl/explain_filter_agg_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -85,7 +84,7 @@ public void testSortPushDownExplain() throws Exception {
? loadFromFile("expectedOutput/calcite/explain_sort_push.json")
: loadFromFile("expectedOutput/ppl/explain_sort_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -102,7 +101,7 @@ public void testLimitPushDownExplain() throws Exception {
? loadFromFile("expectedOutput/calcite/explain_limit_push.json")
: loadFromFile("expectedOutput/ppl/explain_limit_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -115,7 +114,7 @@ public void testLimitPushDownExplain() throws Exception {
public void testFillNullPushDownExplain() throws Exception {
String expected = loadFromFile("expectedOutput/ppl/explain_fillnull_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -126,7 +125,7 @@ public void testFillNullPushDownExplain() throws Exception {
public void testTrendlinePushDownExplain() throws Exception {
String expected = loadFromFile("expectedOutput/ppl/explain_trendline_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand All @@ -139,7 +138,7 @@ public void testTrendlinePushDownExplain() throws Exception {
public void testTrendlineWithSortPushDownExplain() throws Exception {
String expected = loadFromFile("expectedOutput/ppl/explain_trendline_sort_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand Down Expand Up @@ -173,7 +172,7 @@ public void testPatternsWithoutAggExplain() throws Exception {
? loadFromFile("expectedOutput/calcite/explain_patterns.json")
: loadFromFile("expectedOutput/ppl/explain_patterns.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString("source=opensearch-sql_test_index_account | patterns email"));
}
Expand All @@ -186,7 +185,7 @@ public void testPatternsWithAggPushDownExplain() throws Exception {
? loadFromFile("expectedOutput/calcite/explain_patterns_agg_push.json")
: loadFromFile("expectedOutput/ppl/explain_patterns_agg_push.json");

assertJsonEquals(
assertJsonEqualsIgnoreRelId(
expected,
explainQueryToString(
"source=opensearch-sql_test_index_account"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -394,4 +394,13 @@ public static Matcher<String> equalToIgnoreCaseAndWhiteSpace(String expectedStri
public static void assertJsonEquals(String expected, String actual) {
assertEquals(JsonParser.parseString(expected), JsonParser.parseString(actual));
}

/** Compare two JSON string are equals with ignoring the RelNode id in the Calcite plan. */
public static void assertJsonEqualsIgnoreRelId(String expected, String actual) {
assertJsonEquals(eliminateRelId(expected), eliminateRelId(actual));
}

private static String eliminateRelId(String s) {
return s.replaceAll("rel#\\d+", "rel#").replaceAll("RelSubset#\\d+", "RelSubset#");
}
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"calcite": {
"logical": "LogicalProject(avg_age=[$2], state=[$1], city=[$0])\n LogicalAggregate(group=[{5, 7}], avg_age=[AVG($8)])\n LogicalFilter(condition=[>($8, 30)])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n",
"physical": "EnumerableCalc(expr#0..3=[{inputs}], expr#4=[0], expr#5=[=($t3, $t4)], expr#6=[null:BIGINT], expr#7=[CASE($t5, $t6, $t2)], expr#8=[CAST($t7):DOUBLE], expr#9=[/($t8, $t3)], avg_age=[$t9], state=[$t1], city=[$t0])\n EnumerableAggregate(group=[{5, 7}], agg#0=[$SUM0($8)], agg#1=[COUNT($8)])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[FILTER->>($8, 30)], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"query\":{\"range\":{\"age\":{\"from\":30,\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])\n"
"physical": "EnumerableCalc(expr#0..2=[{inputs}], avg_age=[$t2], state=[$t1], city=[$t0])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[city, state, age], FILTER->>($2, 30), AGGREGATION->rel#12051:LogicalAggregate.NONE.[](input=RelSubset#12050,group={0, 1},avg_age=AVG($2))], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"query\":{\"range\":{\"age\":{\"from\":30,\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}},\"_source\":{\"includes\":[\"city\",\"state\",\"age\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}],\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"city\":{\"terms\":{\"field\":\"city.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}},{\"state\":{\"terms\":{\"field\":\"state.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg_age\":{\"avg\":{\"field\":\"age\"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])\n"
}
}
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"calcite":{
"logical":"LogicalProject(age=[$8])\n LogicalFilter(condition=[>($3, 10000)])\n LogicalFilter(condition=[<($8, 40)])\n LogicalFilter(condition=[>($8, 30)])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n",
"physical":"EnumerableCalc(expr#0..16=[{inputs}], age=[$t8])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[FILTER->>($8, 30), FILTER->AND(<($8, 40), >($3, 10000))], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"query\":{\"bool\":{\"filter\":[{\"range\":{\"age\":{\"from\":30,\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}},{\"bool\":{\"must\":[{\"range\":{\"age\":{\"from\":null,\"to\":40,\"include_lower\":true,\"include_upper\":false,\"boost\":1.0}}},{\"range\":{\"balance\":{\"from\":10000,\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])\n"
"calcite": {
"logical": "LogicalProject(age=[$8])\n LogicalFilter(condition=[>($3, 10000)])\n LogicalFilter(condition=[<($8, 40)])\n LogicalFilter(condition=[>($8, 30)])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n",
"physical": "CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[balance, age], FILTER->>($1, 30), FILTER-><($1, 40), FILTER->>($0, 10000), PROJECT->[age]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"query\":{\"bool\":{\"filter\":[{\"range\":{\"age\":{\"from\":30,\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}},{\"range\":{\"age\":{\"from\":null,\"to\":40,\"include_lower\":true,\"include_upper\":false,\"boost\":1.0}}},{\"range\":{\"balance\":{\"from\":10000,\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[\"age\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])\n"
}
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"calcite":{
"logical":"LogicalProject(ageMinus=[$17])\n LogicalSort(fetch=[5])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16], ageMinus=[-($8, 30)])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n",
"physical":"EnumerableCalc(expr#0..16=[{inputs}], expr#17=[30], expr#18=[-($t8, $t17)], ageMinus=[$t18])\n EnumerableLimit(fetch=[5])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n"
"calcite": {
"logical": "LogicalProject(ageMinus=[$17])\n LogicalSort(fetch=[5])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16], ageMinus=[-($8, 30)])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n",
"physical": "EnumerableCalc(expr#0=[{inputs}], expr#1=[30], expr#2=[-($t0, $t1)], $f0=[$t2])\n EnumerableLimit(fetch=[5])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[age]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"age\"],\"excludes\":[]}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])\n"
}
}
Loading
Loading