Skip to content

[SPARK-26952][SQL] Row count statics should respect the data reported by data source #23853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

ConeyLiu
Copy link
Contributor

What changes were proposed in this pull request?

In data source v2, if the data source scan implemented SupportsReportStatistics. DataSourceV2Relation should respect the row count reported by the data source.

How was this patch tested?

New UT test.

@ConeyLiu
Copy link
Contributor Author

Hi, @cloud-fan would you mind taking a look? Thanks in advance.

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Feb 21, 2019

Test build #102585 has finished for PR 23853 at commit 02a7cb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
None
}
Statistics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we create a util method to turn v2 statistics to spark statistics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will add it later.

@SparkQA
Copy link

SparkQA commented Feb 26, 2019

Test build #102776 has finished for PR 23853 at commit 3b004aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in bc03c8b Feb 26, 2019
@ConeyLiu
Copy link
Contributor Author

Thanks @cloud-fan.

@ConeyLiu ConeyLiu deleted the report-row-count branch February 26, 2019 06:39
mccheah pushed a commit to palantir/spark that referenced this pull request May 15, 2019
… by data source

## What changes were proposed in this pull request?

In data source v2, if the data source scan implemented `SupportsReportStatistics`. `DataSourceV2Relation` should respect the row count reported by the data source.

## How was this patch tested?

New UT test.

Closes apache#23853 from ConeyLiu/report-row-count.

Authored-by: Xianyang Liu <xianyang.liu@intel.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants