Issue 6544 - Query performance improvements on large parquet files#6557
Issue 6544 - Query performance improvements on large parquet files#6557
Conversation
…rge-parquet-files
…les' of github.com:gchq/sleeper into 6544-query-performance-improvements-on-large-parquet-files
...ction-datafusion/src/main/java/sleeper/compaction/datafusion/DataFusionCompactionRunner.java
Outdated
Show resolved
Hide resolved
...y-datafusion/src/main/java/sleeper/query/datafusion/DataFusionLeafPartitionRowRetriever.java
Show resolved
Hide resolved
patchwork01
left a comment
There was a problem hiding this comment.
There are failing tests and Checkstyle problems.
| ); | ||
| // Disable page indexes since we won't benefit from them as we are reading large contiguous file regions | ||
| cfg.options_mut().execution.parquet.enable_page_index = false; | ||
| cfg.options_mut().execution.parquet.enable_page_index = self.config.read_page_indexes(); |
There was a problem hiding this comment.
The issue states that disabling reading page indexes in DataFusion isn't possible with the cached reader.
It looks like you've worked around this in rust/sleeper_core/src/datafusion/metadata_cache.rs?
Please can you explain in a comment here how these two things relate to one another, and what was intended?
Is there a way we can bring the code for this together into one place? It makes it a bit difficult to read with the read_page_indexes setting checked in multiple places.
There was a problem hiding this comment.
Added more comments to code.
There was a problem hiding this comment.
That helps, but I think the important part is the relationship between this line, the other check of read_page_indexes below, and the code in rust/sleeper_core/src/datafusion/metadata_cache.rs. From reading the comment here you wouldn't know that we have worked around the problem with DataFusion, or that those other pieces of code exist.
…rge-parquet-files
…rge-parquet-files
…rge-parquet-files
…les' of github.com:gchq/sleeper into 6544-query-performance-improvements-on-large-parquet-files
Make sure you have checked all steps below.
Issue
Feature". Note that before an issue is finished, you can still make a pull request by raising a separate issue
for your progress.
Tests
Documentation
separate issue for that below.