Skip to content

Conversation

@Comonut
Copy link

@Comonut Comonut commented Aug 13, 2020

What changes were proposed in this pull request?

In the docs concerning the approx_count_distinct I have changed the description of the rsd parameter from maximum estimation error allowed to maximum relative standard deviation allowed

Why are the changes needed?

Maximum estimation error allowed can be misleading. You can set the target relative standard deviation, which affects the estimation error, but on given runs the estimation error can still be above the rsd parameter.

Does this PR introduce any user-facing change?

This PR should make it easier for users reading the docs to understand that the rsd parameter in approx_count_distinct doesn't cap the estimation error, but just sets the target deviation instead,

How was this patch tested?

No tests, as no code changes were made.

@maropu
Copy link
Member

maropu commented Aug 13, 2020

ok to test

@maropu
Copy link
Member

maropu commented Aug 13, 2020

Looks okay.

@maropu maropu changed the title [MINOR] Fixed approx_count_distinct rsd param description [SQL][MINOR] Fixed approx_count_distinct rsd param description Aug 13, 2020
@SparkQA
Copy link

SparkQA commented Aug 13, 2020

Test build #127426 has finished for PR 29424 at commit 1a405a8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks fine. Can you address the style nits @Comonut?

@HyukjinKwon HyukjinKwon changed the title [SQL][MINOR] Fixed approx_count_distinct rsd param description [MINOR][SQL] Fixed approx_count_distinct rsd param description Aug 14, 2020
@Comonut
Copy link
Author

Comonut commented Aug 14, 2020

Thanks! Style issues should be fine now

@SparkQA
Copy link

SparkQA commented Aug 14, 2020

Test build #127448 has finished for PR 29424 at commit aab1e1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 10edeaf Aug 14, 2020
maropu pushed a commit that referenced this pull request Aug 14, 2020
### What changes were proposed in this pull request?

In the docs concerning the approx_count_distinct I have changed the description of the rsd parameter from **_maximum estimation error allowed_** to _**maximum relative standard deviation allowed**_

### Why are the changes needed?

Maximum estimation error allowed can be misleading. You can set the target relative standard deviation, which affects the estimation error, but on given runs the estimation error can still be above the rsd parameter.

### Does this PR introduce _any_ user-facing change?

This PR should make it easier for users reading the docs to understand that the rsd parameter in approx_count_distinct doesn't cap the estimation error, but just sets the target deviation instead,

### How was this patch tested?

No tests, as no code changes were made.

Closes #29424 from Comonut/fix-approx_count_distinct-rsd-param-description.

Authored-by: alexander-daskalov <alexander.daskalov@adevinta.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 10edeaf)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Aug 14, 2020

Thanks! Merged to master/branch-3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants