Skip to content

Explain analyze doesn't (fully) optimize queries #917

@Dandandan

Description

@Dandandan

Describe the bug

A simple query like select max(l_partkey) from lineitem has a different plan when using explain analyze than what is executed when trying a normal select.

The output from explain analyze is slower than normal and hasn't the statistics-based optimization enabled:

+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                             |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[MAX(lineitem.l_partkey)@0 as MAX(l_partkey)], metrics=[]                                                                                                                                                                                                                                   |
|                   |   HashAggregateExec: mode=Final, gby=[], aggr=[MAX(l_partkey)], metrics=[outputRows=0]                                                                                                                                                                                                                           |
|                   |     CoalescePartitionsExec, metrics=[]                                                                                                                                                                                                                                                                           |
|                   |       HashAggregateExec: mode=Partial, gby=[], aggr=[MAX(l_partkey)], metrics=[outputRows=0]                                                                                                                                                                                                                     |
|                   |         RepartitionExec: partitioning=RoundRobinBatch(16), metrics=[fetchTime=51231297, repartitionTime=0, sendTime=120310]                                                                                                                                                                                      |
|                   |           ParquetExec: batch_size=8192, limit=None, partitions=[../benchmarks/parquet/lineitem/part-0.parquet], metrics=[numPredicateCreationErrors=0, numPredicateEvaluationErrors for ../benchmarks/parquet/lineitem/part-0.parquet=0, numRowGroupsPruned for ../benchmarks/parquet/lineitem/part-0.parquet=0] |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.071 seconds.
> explain select max(l_partkey) from lineitem;
+---------------+--------------------------------------------------------------------+
| plan_type     | plan                                                               |
+---------------+--------------------------------------------------------------------+
| logical_plan  | Projection: #MAX(lineitem.l_partkey)                               |
|               |   Projection: Int32(200000) AS MAX(l_partkey)                      |
|               |     EmptyRelation                                                  |
| physical_plan | ProjectionExec: expr=[MAX(lineitem.l_partkey)@0 as MAX(l_partkey)] |
|               |   RepartitionExec: partitioning=RoundRobinBatch(16)                |
|               |     ProjectionExec: expr=[200000 as MAX(l_partkey)]                |
|               |       EmptyExec: produce_one_row=true                              |
+---------------+--------------------------------------------------------------------+
2 rows in set. Query took 0.002 seconds.

To Reproduce
run some queries with explain vs explain analyze.

Expected behavior
explain analyze should have same plans

Additional context

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions