You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
Current, we use total_byte_size to store whole byte_size.
But it isn't good enough, a better way is to put avg_data_size/total_date_size into ColumnStatistics.
a total_byte_size of statistic is useless, because we hard to propagate it in Statistic derive.
But if we use Column avg_data_size we can use it to propagate it into other Plan.
I know this is a pretty old issue, but I would also be interested in it and capable of doing the implementation if there's interest by the maintainers.
@AdamGS +1 from me.
The average data size sounds most logical from optimizer's perspective
(i was involved in the introduction of ColumnStatistics.dataSize of Presto/Trino, but the reality is the value is so often divided by #rows ...)
Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
Current, we use
total_byte_size
to store whole byte_size.But it isn't good enough, a better way is to put
avg_data_size/total_date_size
intoColumnStatistics
.a
total_byte_size
of statistic is useless, because we hard to propagate it inStatistic derive
.But if we use
Column avg_data_size
we can use it to propagate it into other Plan.Spark:
Presto
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: