Skip to content

Add returnsZeroOnEmptyInput to AggregationFunctionMetadata #17963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

lukasz-stec
Copy link
Member

Description

COUNT, COUNT_IF, and APPROX_DISTINCT do have not standard
behavior when no input was supplied to the function.
They return the value 0, as opposed to standard NULL.
Before this change, OptimizeMixedDistinctAggregations relied on
matching string function name to handle this case, but
this is brittle and may cause silent correctness issues
if a new function with this characteristic is added.

Additional context and related issues

Release notes

( X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jun 20, 2023
@lukasz-stec lukasz-stec requested review from dain and sopel39 June 20, 2023 08:54
@lukasz-stec
Copy link
Member Author

@dain @sopel39 I'm not sure this is really necessary but it seems better than comparing strings in some rule. WDYT?

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % but I would wait for @dain review too

COUNT, COUNT_IF, and APPROX_DISTINCT do have not standard
behavior when no input was supplied to the function.
They return the value 0, as opposed to standard NULL.
Before this change, OptimizeMixedDistinctAggregations relied on
matching string function name to handle this case, but
this is brittle and may cause silent correctness issues
if a new function with this characteristic is added.
@lukasz-stec lukasz-stec force-pushed the ls/2306/004-zero-on-empty branch from f958e07 to 2195285 Compare June 26, 2023 10:23
@lukasz-stec
Copy link
Member Author

I rebased on the master and dropped the first commit with tests that landed there

@martint
Copy link
Member

martint commented Jun 26, 2023

I’m not sure this is the right abstraction. There’s nothing standard about returning 0 on empty input. What would that property mean for a function that does not return a number but returns, say, an empty array or map on no input?

@lukasz-stec
Copy link
Member Author

I’m not sure this is the right abstraction. There’s nothing standard about returning 0 on empty input. What would that property mean for a function that does not return a number but returns, say, an empty array or map on no input?

Without changes to the OptimizeMixedDistinctAggregations, a function like that would introduce a silent correctness issue in case OptimizeMixedDistinctAggregations is enabled.
The motivation for this change is to make the author of the new function think about this characteristic.

@martint
Copy link
Member

martint commented Jun 26, 2023

I understand that, and it’s fine for the optimization to be targeted at functions that return 0 on no input. I’m not sure it should be a generic attribute for every aggregation, unless we can generalize it to aggregations that don’t return numbers.

@lukasz-stec
Copy link
Member Author

lukasz-stec commented Jun 26, 2023

Generalization would overcomplicate things and be hard to test, so I wouldn't introduce it until needed.
If not this generic attribute, do you see some other way to handle

if (signatureName.equals("count") || signatureName.equals("count_if") || signatureName.equals("approx_distinct")) {

?
Or we shouldn't worry about potential bug here?

@lukasz-stec
Copy link
Member Author

As @martint does not agree it's a good approach, I'm closing the PR

@ksobolew ksobolew deleted the ls/2306/004-zero-on-empty branch July 11, 2023 09:13
@ksobolew ksobolew restored the ls/2306/004-zero-on-empty branch July 11, 2023 09:13
@ksobolew ksobolew deleted the ls/2306/004-zero-on-empty branch July 11, 2023 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants