Open
Description
Is your feature request related to a problem or challenge?
We are trying to improve the speed of DataFusion when running the ClickBench partitioned test (which has 100 files) -- this means the per-file overhead is important to redudce
One structure that has non trivial overhead is the Statistics
structure (as it has a ScalarValue
for each column of each file so there are 100 * (number columns) * 2 at least ScalarValues
Describe the solution you'd like
It would be great to reduce the overhead of passing around these values.
Describe alternatives you've considered
One way to do so is to avoid copying them when the underlying ParquetExec
is copied by using an Option<Arc<Statistics>>
here:
Additional context
Interestingly @Rachelint
#11802 (comment)