Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate use of Mahalanobis distance #225

Merged
merged 12 commits into from
Jan 15, 2021
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
- name: Format check
run: |
CHANGED="$(git diff --name-only)"
if [ ! -z $CHANGED ]; then
if [ ! -z "$CHANGED" ]; then
>&2 echo "Some files have not been formatted !!!"
echo "$CHANGED"
exit 1
Expand Down
1 change: 0 additions & 1 deletion docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ NeuralNetworkKernel
LinearKernel
PolynomialKernel
PiecewisePolynomialKernel
MahalanobisKernel
RationalQuadraticKernel
GammaRationalQuadraticKernel
spectral_mixture_kernel
Expand Down
17 changes: 10 additions & 7 deletions docs/src/kernels.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,19 +153,22 @@ where $r$ has the same dimension as $x$ and $r_i > 0$.

## Piecewise Polynomial Kernel

The [`PiecewisePolynomialKernel`](@ref) is defined for $x, x'\in \mathbb{R}^D$, a positive-definite matrix $P \in \mathbb{R}^{D \times D}$, and $V \in \{0,1,2,3\}$ as
The [`PiecewisePolynomialKernel`](@ref) of degree $v \in \{0,1,2,3\}$ is defined for
inputs $x, x' \in \mathbb{R}^d$ of dimension $d$ as
```math
k(x,x'; P, V) = \max(1 - \sqrt{x^\top P x'}, 0)^{j + V} f_V(\sqrt{x^\top P x'}, j),
devmotion marked this conversation as resolved.
Show resolved Hide resolved
k(x, x'; v) = \max(1 - \|x - x'\|, 0)^{\alpha} f_{v,d}(\|x - x'\|),
```
where $j = \lfloor \frac{D}{2}\rfloor + V + 1$, and $f_V$ are polynomials defined as follows:
where $\alpha = \lfloor \frac{d}{2}\rfloor + 2v + 1$, and $f_{v,d}$ are polynomials of
degree $v$ given by
```math
\begin{aligned}
f_0(r, j) &= 1, \\
f_1(r, j) &= 1 + (j + 1) r, \\
f_2(r, j) &= 1 + (j + 2) r + ((j^2 + 4j + 3) / 3) r^2, \\
f_3(r, j) &= 1 + (j + 3) r + ((6 j^2 + 36j + 45) / 15) r^2 + ((j^3 + 9 j^2 + 23j + 15) / 15) r^3.
f_{0,d}(r) &= 1, \\
f_{1,d}(r) &= 1 + (j + 1) r, \\
f_{2,d}(r) &= 1 + (j + 2) r + ((j^2 + 4j + 3) / 3) r^2, \\
f_{3,d}(r) &= 1 + (j + 3) r + ((6 j^2 + 36j + 45) / 15) r^2 + ((j^3 + 9 j^2 + 23j + 15) / 15) r^3,
\end{aligned}
```
where $j = \lfloor \frac{d}{2}\rfloor + v + 1$.

## Polynomial Kernels

Expand Down
16 changes: 16 additions & 0 deletions docs/src/userguide.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,22 @@ For example, a squared exponential kernel is created by
k = 3.0 * SqExponentialKernel()
```

!!! tip "How do I use a Mahalanobis kernel?"
The `MahalanobisKernel(; P=P)`, defined by
```math
k(x, x'; P) = \exp{\big(- (x - x')^\top P (x - x')\big)}
```
for a positive definite matrix $P = Q^\top Q$, is deprecated. Instead you can
use a squared exponential kernel together with a [`LinearTransform`](@ref) of
the inputs:
```julia
k = transform(SqExponentialKernel(), LinearTransform(sqrt(2) .* Q))
```
Analogously, you can combine other kernels such as the
[`PiecewisePolynomialKernel`](@ref) with a [`LinearTransform`](@ref) of the
inputs to obtain a kernel that is a function of the Mahalanobis distance
between inputs.

## Using a kernel function

To evaluate the kernel function on two vectors you simply call the kernel object:
Expand Down
5 changes: 3 additions & 2 deletions src/KernelFunctions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ export FBMKernel
export MaternKernel, Matern12Kernel, Matern32Kernel, Matern52Kernel
export LinearKernel, PolynomialKernel
export RationalQuadraticKernel, GammaRationalQuadraticKernel
export MahalanobisKernel, GaborKernel, PiecewisePolynomialKernel
export GaborKernel, PiecewisePolynomialKernel
export PeriodicKernel, NeuralNetworkKernel
export KernelSum, KernelProduct
export TransformedKernel, ScaledKernel
Expand Down Expand Up @@ -90,7 +90,6 @@ include(joinpath("basekernels", "exponential.jl"))
include(joinpath("basekernels", "exponentiated.jl"))
include(joinpath("basekernels", "fbm.jl"))
include(joinpath("basekernels", "gabor.jl"))
include(joinpath("basekernels", "maha.jl"))
include(joinpath("basekernels", "matern.jl"))
include(joinpath("basekernels", "nn.jl"))
include(joinpath("basekernels", "periodic.jl"))
Expand Down Expand Up @@ -118,6 +117,8 @@ include("zygote_adjoints.jl")

include("test_utils.jl")

include("deprecated.jl")

function __init__()
@require Kronecker = "2c470bb0-bcc8-11e8-3dad-c9649493f05e" begin
include(joinpath("matrix", "kernelkroneckermat.jl"))
Expand Down
27 changes: 0 additions & 27 deletions src/basekernels/maha.jl

This file was deleted.

117 changes: 81 additions & 36 deletions src/basekernels/piecewisepolynomial.jl
Original file line number Diff line number Diff line change
@@ -1,56 +1,101 @@
"""
devmotion marked this conversation as resolved.
Show resolved Hide resolved
PiecewisePolynomialKernel{V}(maha::AbstractMatrix)
PiecewisePolynomialKernel(; degree::Int=0, dim::Int)
PiecewisePolynomialKernel{degree}(dim::Int)

Piecewise Polynomial covariance function with compact support, V = 0,1,2,3.
The kernel functions are 2V times continuously differentiable and the corresponding
processes are hence V times mean-square differentiable. The kernel function is:
Piecewise polynomial kernel of degree `degree` for inputs of dimension `dim` with support in
the unit ball.

# Definition

For inputs ``x, x' \\in \\mathbb{R}^d`` of dimension ``d``, the piecewise polynomial kernel
of degree ``v \\in \\{0,1,2,3\\}`` is defined as
```math
k(x, x'; v) = \\max(1 - \\|x - x'\\|, 0)^{\\alpha(v,d)} f_{v,d}(\\|x - x'\\|),
```
where ``\\alpha(v, d) = \\lfloor \\frac{d}{2}\\rfloor + 2v + 1`` and ``f_{v,d}`` are
polynomials of degree ``v`` given by
```math
κ(x, y) = max(1 - r, 0)^(j + V) * f(r, j) with j = floor(D / 2) + V + 1
\\begin{aligned}
f_{0,d}(r) &= 1, \\\\
f_{1,d}(r) &= 1 + (j + 1) r, \\\\
f_{2,d}(r) &= 1 + (j + 2) r + ((j^2 + 4j + 3) / 3) r^2, \\\\
f_{3,d}(r) &= 1 + (j + 3) r + ((6 j^2 + 36j + 45) / 15) r^2 + ((j^3 + 9 j^2 + 23j + 15) / 15) r^3,
\\end{aligned}
```
where `r` is the Mahalanobis distance mahalanobis(x,y) with `maha` as the metric.
where ``j = \\lfloor \\frac{d}{2}\\rfloor + v + 1``.

The kernel is ``2v`` times continuously differentiable and the corresponding Gaussian
process is hence ``v`` times mean-square differentiable.
"""
struct PiecewisePolynomialKernel{V,A<:AbstractMatrix{<:Real}} <: SimpleKernel
maha::A
j::Int
function PiecewisePolynomialKernel{V}(maha::AbstractMatrix{<:Real}) where {V}
V in (0, 1, 2, 3) || error("Invalid parameter V=$(V). Should be 0, 1, 2 or 3.")
LinearAlgebra.checksquare(maha)
j = div(size(maha, 1), 2) + V + 1
return new{V,typeof(maha)}(maha, j)
struct PiecewisePolynomialKernel{D,C<:Tuple} <: SimpleKernel
alpha::Int
coeffs::C

function PiecewisePolynomialKernel{D}(dim::Int) where {D}
dim > 0 || error("number of dimensions has to be positive")
j = div(dim, 2) + D + 1
alpha = j + D
coeffs = piecewise_polynomial_coefficients(Val(D), j)
return new{D,typeof(coeffs)}(alpha, coeffs)
end
end

function PiecewisePolynomialKernel(; v::Integer=0, maha::AbstractMatrix{<:Real})
return PiecewisePolynomialKernel{v}(maha)
end
# TODO: remove `maha` keyword argument in next breaking release
function PiecewisePolynomialKernel(; v::Int=-1, degree::Int=v, maha=nothing, dim::Int=-1)
if v != -1
Base.depwarn(
"keyword argument `v` is deprecated, use `degree` instead",
:PiecewisePolynomialKernel,
)
end

# Have to reconstruct the type parameter
# See also https://github.com/FluxML/Functors.jl/issues/3#issuecomment-626747663
function Functors.functor(::Type{<:PiecewisePolynomialKernel{V}}, x) where {V}
function reconstruct_kernel(xs)
return PiecewisePolynomialKernel{V}(xs.maha)
if maha !== nothing
Base.depwarn(
"keyword argument `maha` is deprecated, use a `LinearTransform` instead",
:PiecewisePolynomialKernel,
)
dim = size(maha, 1)
return transform(
PiecewisePolynomialKernel{degree}(dim), LinearTransform(cholesky(maha).U)
)
else
return PiecewisePolynomialKernel{degree}(dim)
end
return (maha=x.maha,), reconstruct_kernel
end

_f(κ::PiecewisePolynomialKernel{0}, r, j) = 1
_f(κ::PiecewisePolynomialKernel{1}, r, j) = 1 + (j + 1) * r
_f(κ::PiecewisePolynomialKernel{2}, r, j) = 1 + (j + 2) * r + (j^2 + 4 * j + 3) / 3 * r .^ 2
function _f(κ::PiecewisePolynomialKernel{3}, r, j)
return 1 +
(j + 3) * r +
(6 * j^2 + 36j + 45) / 15 * r .^ 2 +
(j^3 + 9 * j^2 + 23j + 15) / 15 * r .^ 3
piecewise_polynomial_coefficients(::Val{0}, ::Int) = (1,)
piecewise_polynomial_coefficients(::Val{1}, j::Int) = (1, j + 1)
piecewise_polynomial_coefficients(::Val{2}, j::Int) = (1, j + 2, (j^2 + 4 * j)//3 + 1)
function piecewise_polynomial_coefficients(::Val{3}, j::Int)
return (1, j + 3, (2 * j^2 + 12 * j)//5 + 3, (j^3 + 9 * j^2 + 23 * j)//15 + 1)
end
function piecewise_polynomial_coefficients(::Val{D}, ::Int) where {D}
return error("invalid degree $D, only 0, 1, 2, or 3 are supported")
end

function kappa(κ::PiecewisePolynomialKernel{V}, r) where {V}
return max(1 - r, 0)^(κ.j + V) * _f(κ, r, κ.j)
# `evalpoly` is not available on Julia < 1.4
@static if VERSION < v"1.4"
@generated function _evalpoly(r, coeffs)
N = length(coeffs.parameters)
return quote
return @evalpoly(r, $((:(coeffs[$i]) for i in 1:N)...))
end
end
else
_evalpoly(r, coeffs) = evalpoly(r, coeffs)
end

metric(κ::PiecewisePolynomialKernel) = Mahalanobis(κ.maha)
kappa(κ::PiecewisePolynomialKernel, r) = max(1 - r, 0)^κ.alpha * _evalpoly(r, κ.coeffs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was your motivation behind changing all of this from explicit functions to storing coefficients inside the type struct? (I found the previous version of the code much easier to understand and read.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wasteful to re-compute the fixed coefficients every time the kernel function is evaluated. Moreover, evalpoly uses Horner's method and hence the evaluation of the polynomials is more stable and more efficient than the naive implementation before.


metric(::PiecewisePolynomialKernel) = Euclidean()

function Base.show(io::IO, κ::PiecewisePolynomialKernel{V}) where {V}
function Base.show(io::IO, κ::PiecewisePolynomialKernel{D}) where {D}
return print(
io, "Piecewise Polynomial Kernel (v = ", V, ", size(maha) = ", size(κ.maha), ")"
io,
"Piecewise Polynomial Kernel (degree = ",
D,
", ⌊dim/2⌋ = ",
κ.alpha - 1 - 2 * D,
")",
)
end
9 changes: 9 additions & 0 deletions src/deprecated.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# TODO: remove tests when removed
@deprecate MahalanobisKernel(; P::AbstractMatrix{<:Real}) transform(
SqExponentialKernel(), LinearTransform(sqrt(2) .* cholesky(P).U)
)

# TODO: remove keyword argument `maha` when removed
@deprecate PiecewisePolynomialKernel{V}(A::AbstractMatrix{<:Real}) where {V} transform(
PiecewisePolynomialKernel{V}(size(A, 1)), LinearTransform(cholesky(A).U)
)
38 changes: 3 additions & 35 deletions test/basekernels/maha.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
@testset "maha" begin
rng = MersenneTwister(123456)
x = 2 * rand(rng)
D_in = 3
v1 = rand(rng, D_in)
v2 = rand(rng, D_in)
Expand All @@ -9,40 +8,10 @@
P = Matrix(Cholesky(U, 'U', 0))
@assert isposdef(P)

k = MahalanobisKernel(; P=P)

@test kappa(k, x) == exp(-x)
k = @test_deprecated MahalanobisKernel(; P=P)
@test k isa TransformedKernel{SqExponentialKernel,<:LinearTransform}
@test k.transform.A sqrt(2) .* U
@test k(v1, v2) exp(-sqmahalanobis(v1, v2, P))
@test kappa(ExponentialKernel(), x) == kappa(k, x)
@test repr(k) == "Mahalanobis Kernel (size(P) = $(size(P)))"

M1, M2 = rand(rng, 3, 2), rand(rng, 3, 2)

function FiniteDifferences.to_vec(dist::SqMahalanobis)
return vec(dist.qmat), x -> SqMahalanobis(reshape(x, size(dist.qmat)...))
end
a = rand()

function test_mahakernel(U::UpperTriangular, v1::AbstractVector, v2::AbstractVector)
return MahalanobisKernel(; P=Array(U' * U))(v1, v2)
end

@test all(
FiniteDifferences.j′vp(FDM, test_mahakernel, a, U, v1, v2)[1] .≈
UpperTriangular(Zygote.pullback(test_mahakernel, U, v1, v2)[2](a)[1]),
)

function test_sqmaha(U::UpperTriangular, v1::AbstractVector, v2::AbstractVector)
return SqMahalanobis(Array(U' * U))(v1, v2)
end

@test all(
FiniteDifferences.j′vp(FDM, test_sqmaha, a, U, v1, v2)[1] .≈
UpperTriangular(Zygote.pullback(test_sqmaha, U, v1, v2)[2](a)[1]),
)

# test_ADs(U -> MahalanobisKernel(P=Array(U' * U)), U, ADs=[:Zygote])
@test_broken "Nothing passes (problem with Mahalanobis distance in Distances)"

# Standardised tests.
@testset "ColVecs" begin
Expand All @@ -57,5 +26,4 @@
x2 = RowVecs(randn(2, D_in))
TestUtils.test_interface(k, x0, x1, x2)
end
test_params(k, (P,))
end
26 changes: 18 additions & 8 deletions test/basekernels/piecewisepolynomial.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,32 @@
v1 = rand(D)
v2 = rand(D)
maha = Matrix{Float64}(I, D, D)
v = 3
k = PiecewisePolynomialKernel{v}(maha)
degree = 3

k2 = PiecewisePolynomialKernel(; v=v, maha=maha)
k = PiecewisePolynomialKernel(; degree=degree, dim=D)
k2 = PiecewisePolynomialKernel{degree}(D)
k3 = @test_deprecated PiecewisePolynomialKernel{degree}(maha)
k4 = @test_deprecated PiecewisePolynomialKernel(; degree=degree, maha=maha)
k5 = @test_deprecated PiecewisePolynomialKernel(; v=degree, dim=D)
k6 = @test_deprecated PiecewisePolynomialKernel(; v=degree, maha=maha)

@test k2(v1, v2) ≈ k(v1, v2) atol = 1e-5
@test k2(v1, v2) == k(v1, v2)
@test k3(v1, v2) ≈ k(v1, v2)
@test k4(v1, v2) ≈ k(v1, v2)
@test k5(v1, v2) ≈ k(v1, v2)
@test k6(v1, v2) ≈ k(v1, v2)

@test_throws ErrorException PiecewisePolynomialKernel{4}(maha)
@test_throws ErrorException PiecewisePolynomialKernel{4}(D)
@test_throws ErrorException PiecewisePolynomialKernel{degree}(-1)

@test repr(k) == "Piecewise Polynomial Kernel (v = $(v), size(maha) = $(size(maha)))"
@test repr(k) ==
"Piecewise Polynomial Kernel (degree = $(degree), ⌊dim/2⌋ = $(div(D, 2)))"

# Standardised tests.
TestUtils.test_interface(k, ColVecs{Float64}; dim_in=2)
TestUtils.test_interface(k, RowVecs{Float64}; dim_in=2)
# test_ADs(maha-> PiecewisePolynomialKernel(v=2, maha = maha), maha)
@test_broken "Nothing passes (problem with Mahalanobis distance in Distances)"
test_ADs(() -> PiecewisePolynomialKernel{degree}(D))

test_params(k, (maha,))
test_params(k, ())
end