Skip to content

cov not behaving properly for vector of vectors? #114

Open
@dannys4

Description

@dannys4

I'm not sure if I'm understanding the behavior of cov correctly, but it seems like the current implementation for between two vectors only works on a vector of numbers. It currently gives the dot product between the vectors, which is only equivalent to what the documentation says when the vectors only contain scalars (since it's doing an inner product). At some point before bringing it over from StatsBase.jl, this probably worked according to this comment, but right now it doesn't give the behavior I expect. For example,

julia> N = 1000;

julia> A = [randn(5) for _ in 1:N];

julia> B = [randn(5) for _ in 1:N];

julia> using Statistics

julia> cov(A,B) # This is unexpected
0.15626480407767923

julia> sum((a-mean(A))*(b-mean(B))' for (a,b) in zip(A,B))/(N-1) # This is expected
5×5 Matrix{Float64}:
  9.35299e-16  -5.73004e-16  -4.02748e-16   9.26408e-16  -3.28066e-16
 -7.09476e-16   1.65189e-15   2.7001e-15   -9.89532e-16  -2.72944e-16
  2.08931e-16   3.56516e-16   7.28591e-16   9.61526e-16  -1.33894e-15
  3.51182e-16  -1.68034e-16  -1.23136e-16  -2.57385e-16   9.51302e-16
 -1.97373e-16   5.99676e-16   2.97838e-16  -1.7159e-16   -1.01532e-15

I'm not saying the above is the best possible implementation, just that it gives the answer I expect. I would propose (and be willing to put up a PR for!) a change where the current implementation of unscaled_covzm is specialized to AbstractVector{<:Number} and there's a different general implementation of it for AbstractVector, similar to what I put up above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions