-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add handling of an empty iterator for mean and var #29033
Conversation
@@ -86,6 +86,10 @@ end | |||
@test ismissing(mean([missing, NaN])) | |||
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0]) | |||
@test mean(skipmissing([1, missing, 2])) === 1.5 | |||
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also make sense to test that an empty Int
input gives a Float64
output, and that an error is thrown when the element type isn't known (e.g. using a generator).
BTW, you could probably use ===
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is in muddy waters :) You cannot use ===
as we have different NaN
in Julia and here it would fail. Notice that:
julia> NaN === -NaN
false
and this is exactly the case here, that is why I had to use isequal
.
I will add the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing something: you really expect NaN::Float64
here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation is more complex (and that is why I have used this as a test):
julia> x = mean(Complex{Float64}[])
NaN + NaN*im
julia> bitstring(x.re)
"1111111111111000000000000000000000000000000000000000000000000000"
julia> bitstring(x.im)
"1111111111111000000000000000000000000000000000000000000000000000"
julia> bitstring(NaN) # see that the bitstring is different
"0111111111111000000000000000000000000000000000000000000000000000"
also you have:
julia> x = mean(Int[])
NaN
julia> x === NaN
false
For the same reason that we have different NaN
s in Julia (and as you see they actually happen in practice).
stdlib/Statistics/test/runtests.jl
Outdated
@@ -245,6 +249,9 @@ end | |||
@test ismissing(f([missing, NaN], missing)) | |||
@test f(skipmissing([1, missing, 2]), 0) === f([1, 2], 0) | |||
end | |||
|
|||
@test isequal(mean(Complex{Float64}[]), NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you intended to test var
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Fixed.
stdlib/Statistics/src/Statistics.jl
Outdated
@@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me | |||
function _var(iterable, corrected::Bool, mean) | |||
y = iterate(iterable) | |||
if y === nothing | |||
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))")) | |||
T = eltype(iterable) | |||
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this throws an error when calling zero
if eltype
returns Any
. Maybe better call throw a more explicit error similar to the existing one?
Could use oftype
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave it as is, because this is how var
for AbstractArray
is implemented and we want to be non-breaking (so it should have 100% the same behavior - the same outputs and the same errors - so using array or iterator is transparent).
For sure in Julia 2.0 approach we could consider cleaning it up in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean that instead of letting zero
throw the error, it would be clearer to check T === Any
and throw a more explicit error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that T
does not have to be Any
. It can be an arbitrary type that is or is not a subtype of Number
. And upfront we do not know if this type:
- defines
zero
- has
abs2
defined - implements a constructor from
NaN
Any of those may or may not be true so the only way to be consistent is to repeat the same code that is used for a version of var
defined for arrays (otherwise I will be able to create a situation where array and iterator var
differs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. We could still print a nice error in the most common case, but then we'd lose consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the point for now. As discussed in Julia 2.0 we can probably clean it up in both methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally, would use oftype
as @nalimilan suggested for clarity
b7d48b9
to
79be956
Compare
79be956
to
40b2ea7
Compare
stdlib/Statistics/test/runtests.jl
Outdated
@@ -86,6 +86,19 @@ end | |||
@test ismissing(mean([missing, NaN])) | |||
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0]) | |||
@test mean(skipmissing([1, missing, 2])) === 1.5 | |||
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im) | |||
@test isequal(mean(skipmissing(Complex{Float64}[])), NaN+NaN*im) | |||
@test mean(Complex{Float64}[]) isa Complex{Float64} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also test the value? Same in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is tested above in line 89 (unless you meant something else).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this one is, I originally spotted mean(Int[])
. It would probably be clearer to have them side by side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All cleaned up now (hopefully 😄, as you have a keener eye for details 👍)
720b43b
to
9e0b0ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me, but another review wouldn't hurt.
One CI error seems unrelated. |
Who would be a good person to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
There's a mean(::AbstractRange{<:Real})
method on line 134. Might be worth changing that behavior too for consistency, even though ranges can't have missings.
stdlib/Statistics/src/Statistics.jl
Outdated
@@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me | |||
function _var(iterable, corrected::Bool, mean) | |||
y = iterate(iterable) | |||
if y === nothing | |||
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))")) | |||
T = eltype(iterable) | |||
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally, would use oftype
as @nalimilan suggested for clarity
Thank you. Changed to |
0aa0683
to
0487d17
Compare
stdlib/Statistics/src/Statistics.jl
Outdated
@@ -131,8 +131,8 @@ mean(A::AbstractArray; dims=:) = _mean(A, dims) | |||
_mean(A::AbstractArray{T}, region) where {T} = mean!(Base.reducedim_init(t -> t/2, +, A, region), A) | |||
_mean(A::AbstractArray, ::Colon) = sum(A) / length(A) | |||
|
|||
function mean(r::AbstractRange{<:Real}) | |||
isempty(r) && throw(ArgumentError("mean of an empty range is undefined")) | |||
function Statistics.mean(r::AbstractRange{<:Real}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add a module prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. Will fix.
How do we want to proceed with this PR? |
Just merge? |
I think just go ahead with this. We can always revert it. |
I like when everybody approves but nobody actually merges... :-) |
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
This is a follow up of #28777 fixing what is minimally needed to be consistent in Julia 1.0 between empty arrays and empty collections (needed for
skipmissing
).Major cleanup of consistency of the outputs for different scenarios of application of functions from Statistics module are left for Julia 2.0.