Skip to content

[RFC] Explain endpoint output for V3 #3519

@LantaoJin

Description

@LantaoJin

Problem Statement
The explain endpoint is used very often when user want to get insight how the query is executed in the engine.

Here is an example output of explain endpoint for v2
Query:

POST /_plugins/_ppl/_explain
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}

Result:

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[count(), country]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=state_country, sourceBuilder={"from":0,"size":10000,"timeout":"1m","query":{"bool":{"should":[{"term":{"country":{"value":"USA","boost":1.0}}},{"term":{"country":{"value":"England","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"""
        },
        "children": []
      }
    ]
  }
}

The output structured by JSON.
The physical plan constructed as a nested JSON

  "root": {
    "name": "ProjectOperator",
    ...,
    "children": [
      {
        "name": "OpenSearchIndexScan",
        ...,

Meanwhile, we can see the pushdowns:

  • filter pushdown:
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "country": {
                            "value": "USA",
                            "boost": 1
                        }
                    }
                },
                {
                    "term": {
                        "country": {
                            "value": "England",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    }
  • aggregation pushdown:
    "aggregations": {
        "composite_buckets": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "country": {
                            "terms": {
                                "field": "country",
                                "missing_bucket": true,
                                "missing_order": "first",
                                "order": "asc"
                            }
                        }
                    }
                ]
            },
            "aggregations": {
                "count()": {
                    "value_count": {
                        "field": "_index"
                    }
                }
            }
        }
    }

Current State
In V3, we leveraged Calcite to plan and execute PPL queries. The logical and physical plans between v2 and v3 are totally different. We won't reuse the logical and physical plans of v2.

Goals

  • Show logical and physical plans
  • Show the pushdown context and DSL
  • Show extended information, such as codegen

Proposal

  1. Standard output
  • logical and physical plan
  • pushdown context
    Query:
POST /_plugins/_ppl/_explain
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}

or POST /_plugins/_ppl/_explain?format=standard

Output:

{
  "calcite": {
    "logical": """LogicalProject(count()=[$1], country=[$0])
  LogicalAggregate(group=[{1}], count()=[COUNT()])
    LogicalFilter(condition=[SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))])
      CalciteLogicalIndexScan(table=[[OpenSearch, state_country]])
""",
    "physical": """EnumerableCalc(expr#0..1=[{inputs}], count()=[$t1], country=[$t0])
  CalciteEnumerableIndexScan(table=[[OpenSearch, state_country]], PushDownContext=[[FILTER->SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7)), AGGREGATION->rel#53:LogicalAggregate.NONE.[](input=RelSubset#43,group={1},count()=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"terms":{"country":["England","USA"],"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
"""
  }
}

In this new output of v3. We can see the logical plan tree:

LogicalProject(count()=[$1], country=[$0])
  LogicalAggregate(group=[{1}], count()=[COUNT()])
    LogicalFilter(condition=[SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))])
      CalciteLogicalIndexScan(table=[[OpenSearch, state_country]])

and physical plan tree:

EnumerableCalc(expr#0..1=[{inputs}], count()=[$t1], country=[$t0])
  CalciteEnumerableIndexScan(table=[[OpenSearch, state_country]])

In the operator CalciteEnumerableIndexScan, there are filter and aggregation pushdown information:

FILTER:SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))
AGGREGATION:rel#212:LogicalAggregate.NONE.[](input=RelSubset#202,group={1},count()=COUNT())

And pushdown DSL:

OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"terms":{"country":["England","USA"],"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)
  1. Simple output
    Query:
POST /_plugins/_ppl/_explain?format=simple
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}

Output:

{
  "logical": """LogicalProject
  LogicalAggregate
    LogicalFilter
      CalciteLogicalIndexScan
"""
}
  1. Codegen output
    Query:
POST /_plugins/_ppl/_explain?format=codegen
{
  "query" : """
  source = state_country | where country = 'USA' OR country = 'England'
  | stats count() by country
  """
}

Output:

{
  "logical": """LogicalProject(count()=[$1], country=[$0])
  LogicalAggregate(group=[{1}], count()=[COUNT()])
    LogicalFilter(condition=[SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7))])
      CalciteLogicalIndexScan(table=[[OpenSearch, state_country]])
""",
  "physical": """EnumerableCalc(expr#0..1=[{inputs}], count()=[$t1], country=[$t0])
  CalciteEnumerableIndexScan(table=[[OpenSearch, state_country]], PushDownContext=[[FILTER->SEARCH($1, Sarg['England', 'USA':CHAR(7)]:CHAR(7)), AGGREGATION->rel#53:LogicalAggregate.NONE.[](input=RelSubset#43,group={1},count()=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"terms":{"country":["England","USA"],"boost":1.0}},"sort":[{"_doc":{"order":"asc"}}],"aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"country":{"terms":{"field":"country","missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"count()":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
""",
  "extended": """public org.apache.calcite.linq4j.Enumerable bind(final org.apache.calcite.DataContext root) {
  final org.opensearch.sql.opensearch.storage.scan.CalciteEnumerableIndexScan v1stashed = (org.opensearch.sql.opensearch.storage.scan.CalciteEnumerableIndexScan) root.get("v1stashed");
  final org.apache.calcite.linq4j.Enumerable _inputEnumerable = v1stashed.scan();
  return new org.apache.calcite.linq4j.AbstractEnumerable(){
      public org.apache.calcite.linq4j.Enumerator enumerator() {
        return new org.apache.calcite.linq4j.Enumerator(){
            public final org.apache.calcite.linq4j.Enumerator inputEnumerator = _inputEnumerable.enumerator();
            public void reset() {
              inputEnumerator.reset();
            }

            public boolean moveNext() {
              return inputEnumerator.moveNext();
            }

            public void close() {
              inputEnumerator.close();
            }

            public Object current() {
              final Object[] current = (Object[]) inputEnumerator.current();
              final Object input_value = current[1];
              final Object input_value0 = current[0];
              return new Object[] {
                  input_value,
                  input_value0};
            }

          };
      }

    };
}


public Class getElementType() {
  return java.lang.Object[].class;
}


"""
}

Approach
Details the steps or strategy to implement the proposal.

Alternative
Alternative that can serve as workarounds to partially or temporarily solve the problem.

Implementation Discussion
Outlines point the discussion regarding the proposed implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    calcitecalcite migration releated

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions