-
Notifications
You must be signed in to change notification settings - Fork 1.8k
collect table stats by default for listing table #1347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for spotting this @houqp. I plan on working on the default values soon, as I find them to be a bit confused. They are sourced from multiple places and thus hard to document / understand. I don't mind switching the default for
Sadly things are not that simple 😅. There are some cases where this wouldn't be true. The |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a fine change to me -- I wonder if there is some way to test it (or perhaps that will be included in @rdettai 's default work)
|
Thanks @rdettai for the confirmation. Yes, I was thinking of enabling stats in the context of query optimization.
One thing I have noticed in existing system is to leverage distributed compute for this kind of metadata gathering during the planing phase. Similar to the datafusion in datafusion setup we have right now, but balista in balista ;) Perhaps we could have a distributed version of Listing Table implementation in ballista in the future. |
That would be a possibility yes, but I would rather prioritize good support for table formats like Delta and Iceberg 😉, which will solve this in a much more cost effective fashion. |
Rationale for this change
I noticed this when I bump datafusion to in roapi and one of our tests started to fail. Two reasons that I think changing to true might be better:
@rdettai curious what's your thought on this.
I will add a test case to guard against regression if everyone agrees with the direction.
What changes are included in this PR?
Default value for
collect_statchanged totrue.Are there any user-facing changes?
Yes, reverting back to the 5.x behavior.