Major performance regression in reading partitioned Parquet data on master

Reading (partitioned) Parquet data got slower, when executing parquet data.

It seems to have started with #1010 - the commit before doesn't have the performance regression.

This is visible when running TPCH benchmarks, for example, executing query 6 became much slower.

I am not sure what the cause is - it would help to find the commit where the "slowness" was introduced. 

> Looking back at the earlier results I posted - it looks the main difference is that the original `arrow` / `parquet` got slower. I am not sure what the cause is.

_Originally posted by @Dandandan in https://github.com/apache/arrow-datafusion/issues/68#issuecomment-979077578_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Major performance regression in reading partitioned Parquet data on master #1363

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Major performance regression in reading partitioned Parquet data on master #1363

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions