-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-26952][SQL] Row count statics should respect the data reported by data source #23853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, @cloud-fan would you mind taking a look? Thanks in advance. |
ok to test |
Test build #102585 has finished for PR 23853 at commit
|
} else { | ||
None | ||
} | ||
Statistics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we create a util method to turn v2 statistics to spark statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will add it later.
Test build #102776 has finished for PR 23853 at commit
|
thanks, merging to master! |
Thanks @cloud-fan. |
… by data source ## What changes were proposed in this pull request? In data source v2, if the data source scan implemented `SupportsReportStatistics`. `DataSourceV2Relation` should respect the row count reported by the data source. ## How was this patch tested? New UT test. Closes apache#23853 from ConeyLiu/report-row-count. Authored-by: Xianyang Liu <xianyang.liu@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In data source v2, if the data source scan implemented
SupportsReportStatistics
.DataSourceV2Relation
should respect the row count reported by the data source.How was this patch tested?
New UT test.