[RFC] Explain endpoint output for V3

**Problem Statement**
The explain endpoint is used very often when user want to get insight how the query is executed in the engine.

Here is an example output of explain endpoint for v2
Query:
```
POST /_plugins/_ppl/_explain
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}
```
Result:
```
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[count(), country]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=state_country, sourceBuilder={"from":0,"size":10000,"timeout":"1m","query":{"bool":{"should":[{"term":{"country":{"value":"USA","boost":1.0}}},{"term":{"country":{"value":"England","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"""
        },
        "children": []
      }
    ]
  }
}
```

The output structured by JSON.
The physical plan constructed as a nested JSON
```
  "root": {
    "name": "ProjectOperator",
    ...,
    "children": [
      {
        "name": "OpenSearchIndexScan",
        ...,
```


Meanwhile, we can see the pushdowns:
- filter pushdown:
```
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "country": {
                            "value": "USA",
                            "boost": 1
                        }
                    }
                },
                {
                    "term": {
                        "country": {
                            "value": "England",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    }
```
- aggregation pushdown:
```
    "aggregations": {
        "composite_buckets": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "country": {
                            "terms": {
                                "field": "country",
                                "missing_bucket": true,
                                "missing_order": "first",
                                "order": "asc"
                            }
                        }
                    }
                ]
            },
            "aggregations": {
                "count()": {
                    "value_count": {
                        "field": "_index"
                    }
                }
            }
        }
    }
```


**Current State**
In V3, we leveraged Calcite to plan and execute PPL queries. The logical and physical plans between v2 and v3 are totally different. We won't reuse the logical and physical plans of v2.

**Goals**
* Show logical and physical plans
* Show the pushdown context and DSL
* Show extended information, such as codegen

**Proposal**
1. Standard output
  - logical and physical plan
  - pushdown context
Query:
```
POST /_plugins/_ppl/_explain
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}
```
or `POST /_plugins/_ppl/_explain?format=standard`

Output:
```
{
  "calcite": {
    "logical": """LogicalProject(count()=[$1], country=[$0])
  LogicalAggregate(group=[{1}], count()=[COUNT()])
    LogicalFilter(condition=[SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))])
      CalciteLogicalIndexScan(table=[[OpenSearch, state_country]])
""",
    "physical": """EnumerableCalc(expr#0..1=[{inputs}], count()=[$t1], country=[$t0])
  CalciteEnumerableIndexScan(table=[[OpenSearch, state_country]], PushDownContext=[[FILTER->SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7)), AGGREGATION->rel#53:LogicalAggregate.NONE.[](input=RelSubset#43,group={1},count()=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"terms":{"country":["England","USA"],"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
"""
  }
}
```
In this new output of v3. We can see the logical plan tree:
```
LogicalProject(count()=[$1], country=[$0])
  LogicalAggregate(group=[{1}], count()=[COUNT()])
    LogicalFilter(condition=[SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))])
      CalciteLogicalIndexScan(table=[[OpenSearch, state_country]])
```
and physical plan tree:
```
EnumerableCalc(expr#0..1=[{inputs}], count()=[$t1], country=[$t0])
  CalciteEnumerableIndexScan(table=[[OpenSearch, state_country]])
```
In the operator `CalciteEnumerableIndexScan`, there are filter and aggregation pushdown information:
```
FILTER:SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))
AGGREGATION:rel#212:LogicalAggregate.NONE.[](input=RelSubset#202,group={1},count()=COUNT())
```
And pushdown DSL:
```
OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"terms":{"country":["England","USA"],"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)
```
2. Simple output
Query:
```
POST /_plugins/_ppl/_explain?format=simple
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}
```
Output:
```
{
  "logical": """LogicalProject
  LogicalAggregate
    LogicalFilter
      CalciteLogicalIndexScan
"""
}
```

3. Codegen output
Query:
```
POST /_plugins/_ppl/_explain?format=codegen
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}
```
Output:
```
{
  "logical": """LogicalProject(count()=[$1], country=[$0])
  LogicalAggregate(group=[{1}], count()=[COUNT()])
    LogicalFilter(condition=[SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))])
      CalciteLogicalIndexScan(table=[[OpenSearch, state_country]])
""",
  "physical": """EnumerableCalc(expr#0..1=[{inputs}], count()=[$t1], country=[$t0])
  CalciteEnumerableIndexScan(table=[[OpenSearch, state_country]], PushDownContext=[[FILTER->SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7)), AGGREGATION->rel#53:LogicalAggregate.NONE.[](input=RelSubset#43,group={1},count()=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"terms":{"country":["England","USA"],"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
""",
  "extended": """public org.apache.calcite.linq4j.Enumerable bind(final org.apache.calcite.DataContext root) {
  final org.opensearch.sql.opensearch.storage.scan.CalciteEnumerableIndexScan v1stashed = (org.opensearch.sql.opensearch.storage.scan.CalciteEnumerableIndexScan) root.get("v1stashed");
  final org.apache.calcite.linq4j.Enumerable _inputEnumerable = v1stashed.scan();
  return new org.apache.calcite.linq4j.AbstractEnumerable(){
      public org.apache.calcite.linq4j.Enumerator enumerator() {
        return new org.apache.calcite.linq4j.Enumerator(){
            public final org.apache.calcite.linq4j.Enumerator inputEnumerator = _inputEnumerable.enumerator();
            public void reset() {
              inputEnumerator.reset();
            }

            public boolean moveNext() {
              return inputEnumerator.moveNext();
            }

            public void close() {
              inputEnumerator.close();
            }

            public Object current() {
              final Object[] current = (Object[]) inputEnumerator.current();
              final Object input_value = current[1];
              final Object input_value0 = current[0];
              return new Object[] {
                  input_value,
                  input_value0};
            }

          };
      }

    };
}


public Class getElementType() {
  return java.lang.Object[].class;
}


"""
}
```

**Approach**
Details the steps or strategy to implement the proposal.

**Alternative**
Alternative that can serve as workarounds to partially or temporarily solve the problem.

**Implementation Discussion**
Outlines point the discussion regarding the proposed implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Explain endpoint output for V3 #3519

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Explain endpoint output for V3 #3519

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions