Description
I'm not sure if I'm understanding the behavior of cov
correctly, but it seems like the current implementation for between two vectors only works on a vector of numbers. It currently gives the dot product between the vectors, which is only equivalent to what the documentation says when the vectors only contain scalars (since it's doing an inner product). At some point before bringing it over from StatsBase.jl, this probably worked according to this comment, but right now it doesn't give the behavior I expect. For example,
julia> N = 1000;
julia> A = [randn(5) for _ in 1:N];
julia> B = [randn(5) for _ in 1:N];
julia> using Statistics
julia> cov(A,B) # This is unexpected
0.15626480407767923
julia> sum((a-mean(A))*(b-mean(B))' for (a,b) in zip(A,B))/(N-1) # This is expected
5×5 Matrix{Float64}:
9.35299e-16 -5.73004e-16 -4.02748e-16 9.26408e-16 -3.28066e-16
-7.09476e-16 1.65189e-15 2.7001e-15 -9.89532e-16 -2.72944e-16
2.08931e-16 3.56516e-16 7.28591e-16 9.61526e-16 -1.33894e-15
3.51182e-16 -1.68034e-16 -1.23136e-16 -2.57385e-16 9.51302e-16
-1.97373e-16 5.99676e-16 2.97838e-16 -1.7159e-16 -1.01532e-15
I'm not saying the above is the best possible implementation, just that it gives the answer I expect. I would propose (and be willing to put up a PR for!) a change where the current implementation of unscaled_covzm
is specialized to AbstractVector{<:Number}
and there's a different general implementation of it for AbstractVector
, similar to what I put up above.