Skip to content

Major performance regression in reading partitioned Parquet data on master #1363

@Dandandan

Description

@Dandandan

Reading (partitioned) Parquet data got slower, when executing parquet data.

It seems to have started with #1010 - the commit before doesn't have the performance regression.

This is visible when running TPCH benchmarks, for example, executing query 6 became much slower.

I am not sure what the cause is - it would help to find the commit where the "slowness" was introduced.

Looking back at the earlier results I posted - it looks the main difference is that the original arrow / parquet got slower. I am not sure what the cause is.

Originally posted by @Dandandan in #68 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions