Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Attempt 2 #472

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice!

- label: "Julia v1"
plugins:
- JuliaCI/julia#v1:
version: "1"
- JuliaCI/julia-test#v1: ~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- JuliaCI/julia-test#v1: ~
- JuliaCI/julia-test#v1: ~
- JuliaCI/julia-coverage#v1:
codecov: true

agents:
queue: "juliagpu"
cuda: "*"
if: build.message !~ /\[skip tests\]/
timeout_in_minutes: 60

env:
GROUP: CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GROUP: CUDA
GROUP: CUDA
SECRET_CODECOV_TOKEN: "t5UlmhN6IORkngUHsuzYWkMJoIcAu+appM1fZJxZitCmw8gn8EA4jukHSaps23tMpVI19qgHr/F6XeS5M/SG1+lrSa/cG2e2kAD03E34vhK0+P6Fx8CZxL3RsIc1XDSX9qEs1/BGUNHP8t0B/vQumJRqPH5F+IGXhR5yRolhqquJZ5OUHwMGo2+FWY12YWehGjXOTCy/y0f0vYAysLU+TPF2Xa8xpDEJtCQlFDFVSPwtBFqB+8XD9bmtGQkDFfZOw0/5dHCWKMmr/E3Z9xXHgIk74mN91PtVJZDQjEVZOrPLOMOIheTtQFSErVXoKXBXElorcAY96oJQVuvqK61H/A==;U2FsdGVkX18ne7l2AH/iweW0yXOQV+yawNU6Ax21o+vkqTztlWAelcsyPaaMFSqpsYT0+uHAHQTPfqzuKJH7QA=="

4 changes: 3 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
name = "KernelFunctions"
uuid = "ec8451be-7e33-11e9-00cf-bbf324bd1392"
version = "0.10.43"
version = "0.10.44"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
CompositionsBase = "a33af91c-f02d-484b-be07-31d278c5ca2b"
Expand All @@ -25,6 +26,7 @@ ZygoteRules = "700de1a5-db45-46bc-99cf-38207098b444"
ChainRulesCore = "1"
Compat = "3.7, 4"
CompositionsBase = "0.1"
CUDA = "3"
Distances = "0.10"
FillArrays = "0.10, 0.11, 0.12, 0.13"
Functors = "0.1, 0.2"
Expand Down
2 changes: 2 additions & 0 deletions src/KernelFunctions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ export IndependentMOKernel,
export tensor, ⊗, compose

using Compat
using CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless things changed, that's quite a massive dependency no? I saw CuArrays.jl got deprecated, does this mean that CUDA is light now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already depend on Requires, maybe it could be hidden behind a @require?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, maybe the lightweight GPUArraysCore (https://github.com/JuliaGPU/GPUArrays.jl/tree/master/lib/GPUArraysCore) is sufficient?

using ChainRulesCore: ChainRulesCore, Tangent, ZeroTangent, NoTangent
using ChainRulesCore: @thunk, InplaceableThunk
using CompositionsBase
Expand Down Expand Up @@ -75,6 +76,7 @@ include("distances/pairwise.jl")
include("distances/dotproduct.jl")
include("distances/delta.jl")
include("distances/sinus.jl")
include("distances/cuda.jl")

include("transform/transform.jl")
include("transform/scaletransform.jl")
Expand Down
115 changes: 105 additions & 10 deletions src/TestUtils.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
module TestUtils

using CUDA
using Distances
using FillArrays
using LinearAlgebra
using KernelFunctions
using Random
Expand All @@ -20,9 +22,9 @@ Run various consistency checks on `k` at the inputs `x0`, `x1`, and `x2`.
`x0` and `x1` should be of the same length with different values, while `x0` and `x2` should
be of different lengths.

These tests are intended to pick up on really substantial issues with a kernel implementation
(e.g. substantial asymmetry in the kernel matrix, large negative eigenvalues), rather than to
test the numerics in detail, which can be kernel-specific.
These tests are intended to pick up on really substantial issues with a kernel
implementation (e.g. substantial asymmetry in the kernel matrix, large negative
eigenvalues), rather than to test the numerics in detail, which can be kernel-specific.
"""
function test_interface(
k::Kernel,
Expand Down Expand Up @@ -118,6 +120,18 @@ function test_interface(
)
end

function test_interface(
rng::AbstractRNG, k::Kernel, ::Type{<:ColVecs{T, CuMatrix{T}}}; dim_in=2, kwargs...
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
) where {T<:Real}
return test_interface(
k,
_to_cuda_gpu(ColVecs(randn(rng, T, dim_in, 11))),
_to_cuda_gpu(ColVecs(randn(rng, T, dim_in, 11))),
_to_cuda_gpu(ColVecs(randn(rng, T, dim_in, 13)));
kwargs...,
)
end

function test_interface(
rng::AbstractRNG, k::Kernel, ::Type{<:RowVecs{T}}; dim_in=2, kwargs...
) where {T<:Real}
Expand Down Expand Up @@ -176,27 +190,108 @@ function test_interface(k::Kernel, T::Type{<:Real}=Float64; kwargs...)
return test_interface(Random.GLOBAL_RNG, k, T; kwargs...)
end

const FloatType = Union{Float32, Float64}
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not AbstractFloat?


"""
example_inputs(rng::AbstractRNG, type)

Return a tuple of 4 inputs of type `type`. See `methods(example_inputs)` for information
around supported types. It is recommended that you utilise `StableRNGs.jl` for `rng` here
to ensure consistency across Julia versions.
"""
function example_inputs(rng::AbstractRNG, ::Type{Vector{Float64}})
return map(n -> randn(rng, Float64, n), (1, 2, 3, 4))
function example_inputs(rng::AbstractRNG, ::Type{Vector{T}}) where {T<:FloatType}
return map(n -> randn(rng, T, n), (1, 2, 3, 4))
end

function example_inputs(
rng::AbstractRNG, ::Type{ColVecs{Float64,Matrix{Float64}}}; dim::Int=2
)
return map(n -> ColVecs(randn(rng, dim, n)), (1, 2, 3, 4))
rng::AbstractRNG, ::Type{ColVecs{T,Matrix{T}}}; dim::Int=2
) where {T<:FloatType}
return map(n -> ColVecs(randn(rng, T, dim, n)), (1, 2, 3, 4))
end

function example_inputs(
rng::AbstractRNG, ::Type{RowVecs{Float64,Matrix{Float64}}}; dim::Int=2
rng::AbstractRNG, ::Type{RowVecs{T,Matrix{T}}}; dim::Int=2
) where {T<:FloatType}
return map(n -> RowVecs(randn(rng, T, n, dim)), (1, 2, 3, 4))
end

function example_inputs(rng::AbstractRNG, ::Type{CuVector{T}}) where {T<:FloatType}
return map(_to_cuda_gpu, example_inputs(rng, Vector{T}))
end

function example_inputs(
rng::AbstractRNG, ::Type{ColVecs{T, CuMatrix{T}}}; dim::Int=2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
) where {T<:FloatType}
return map(_to_cuda_gpu, example_inputs(rng, ColVecs{T, Matrix{T}}; dim=dim))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

function example_inputs(
rng::AbstractRNG, ::Type{RowVecs{T, CuMatrix{T}}}; dim::Int=2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
) where {T<:FloatType}
return map(_to_cuda_gpu, example_inputs(rng, RowVecs{T, Matrix{T}}; dim=dim))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

function test_gpu_against_cpu(
k::Kernel, T::Type, x1::AbstractVector, x2::AbstractVector; atol=1e-3
)
return map(n -> RowVecs(randn(rng, n, dim)), (1, 2, 3, 4))
@assert length(x1) != length(x2)

k_cpu = _to_cpu(k)
x1_cpu = _to_cpu(x1)
x2_cpu = _to_cpu(x2)
let
K_cpu = kernelmatrix(k_cpu, x1_cpu)
K_gpu = kernelmatrix(k, x1)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == T
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
let
K_cpu = kernelmatrix(k_cpu, x1_cpu, x2_cpu)
K_gpu = kernelmatrix(k, x1, x2)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == T
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
let
K_cpu = kernelmatrix_diag(k_cpu, x1_cpu)
K_gpu = kernelmatrix_diag(k, x1)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == T
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
let
K_cpu = kernelmatrix_diag(k_cpu, x1_cpu, x1_cpu)
K_gpu = kernelmatrix_diag(k, x1, x1)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == T
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
end

function test_gpu_against_cpu(rng::AbstractRNG, T::Type, k::Kernel, data_type::Type)
_, x1, x2, _ = example_inputs(rng, data_type)
test_gpu_against_cpu(k, T, x1, x2)
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

_to_cpu(x::CuArray{<:Real}) = Array(x)
_to_cpu(x::ColVecs{<:Real, <:CuMatrix}) = ColVecs(_to_cpu(x.X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just overload cpu on our type? (or cpu is part of Flux? I don't remember...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use/define Adapt.adapt? And let users pass the types they want to adapt it to?
According to the README of Adapt, we probably would want to define

Adapt.adapt_structure(to, x::ColVecs{<:Real}) = ColVecs(adapt(to, x.X))
Adapt.adapt_structure(to, x::RowVecs{<:Real}) = RowVecs(adapt(to, x.X))

And then one could use e.g. adapt(CuArray, x) or adapt(Array, x) where x could be a ColVecs{<:Real, Matrix{<:Real}}/RowVecs{<:Real,Matrix{<:Real}} or a ColVecs{<:Real,<:CuMatrix}/RowVecs{<:Real,<:CuMatrix} (for the other direction). If the testing function allows to specify the desired type (such as Array or CuArray) users/developers could pass it and we don't have to deal with it in KernelFunctions here, maybe?

_to_cpu(x::RowVecs{<:Real, <:CuMatrix}) = RowVecs(_to_cpu(x.X))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
_to_cpu(x::FillArrays.AbstractFill{<:Real}) = x

_to_cpu(k::Kernel) = k
_to_cpu(k::TransformedKernel) = TransformedKernel(_to_cpu(k.kernel), _to_cpu(k.transform))
_to_cpu(t::Transform) = t
_to_cpu(t::LinearTransform) = LinearTransform(_to_cpu(t.A))
_to_cpu(t::ChainTransform) = ChainTransform(map(_to_cpu, t.transforms))

_to_cuda_gpu(x::Array{<:Real}) = CuArray(x)
_to_cuda_gpu(x::ColVecs{T, Matrix{T}} where {T<:Real}) = ColVecs(_to_cuda_gpu(x.X))
_to_cuda_gpu(x::RowVecs{T, Matrix{T}} where {T<:Real}) = RowVecs(_to_cuda_gpu(x.X))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
_to_cuda_gpu(x::FillArrays.AbstractFill{<:Real}) = x

end # module
20 changes: 19 additions & 1 deletion src/basekernels/exponential.jl
Original file line number Diff line number Diff line change
Expand Up @@ -137,10 +137,28 @@ end

@functor GammaExponentialKernel

kappa(κ::GammaExponentialKernel, d::Real) = exp(-d^only(κ.γ))
_gamma_kappa(γ::Real, d::Real) = exp(-d^γ)

kappa(κ::GammaExponentialKernel, d::Real) = _gamma_kappa(only(κ.γ), d)

metric(k::GammaExponentialKernel) = k.metric

function kernelmatrix(k::GammaExponentialKernel, x::AbstractVector)
return _gamma_kappa.(only(k.γ), pairwise(metric(k), x))
end

function kernelmatrix(k::GammaExponentialKernel, x::AbstractVector, y::AbstractVector)
return _gamma_kappa.(only(k.γ), pairwise(metric(k), x, y))
end

function kernelmatrix_diag(k::GammaExponentialKernel, x::AbstractVector)
return _gamma_kappa.(only(k.γ), colwise(metric(k), x))
end

function kernelmatrix_diag(k::GammaExponentialKernel, x::AbstractVector, y::AbstractVector)
return _gamma_kappa.(only(k.γ), colwise(metric(k), x, y))
end

iskroncompatible(::GammaExponentialKernel) = true

function Base.show(io::IO, κ::GammaExponentialKernel)
Expand Down
18 changes: 15 additions & 3 deletions src/basekernels/fbm.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,19 @@ _mod(x::AbstractVector{<:AbstractVector{<:Real}}) = sum.(abs2, x)
_mod(x::ColVecs) = vec(sum(abs2, x.X; dims=1))
_mod(x::RowVecs) = vec(sum(abs2, x.X; dims=2))

_ones(el_type, x::AbstractVector, sz...) = ones(el_type, sz...)
function _ones(
el_type,
x::Union{CuVector{<:Real}, ColVecs{<:Real, <:CuMatrix}, RowVecs{<:Real, <:CuMatrix}},
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
sz...,
)
return CUDA.ones(el_type, sz...)
end
Comment on lines +51 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use adapt here?


function kernelmatrix(κ::FBMKernel, x::AbstractVector)
modx = _mod(x)
modx_wide = modx * ones(eltype(modx), 1, length(modx)) # ad perf hack -- is unit tested
_v = _ones(eltype(modx), x, 1, length(modx))
modx_wide = modx * _v # ad perf hack -- is unit tested
modxx = pairwise(SqEuclidean(), x)
return _fbm.(modx_wide, modx_wide', modxx, only(κ.h))
end
Expand All @@ -64,8 +74,10 @@ end

function kernelmatrix(κ::FBMKernel, x::AbstractVector, y::AbstractVector)
modxy = pairwise(SqEuclidean(), x, y)
modx_wide = _mod(x) * ones(eltype(modxy), 1, length(y)) # ad perf hack -- is unit tested
mody_wide = _mod(y) * ones(eltype(modxy), 1, length(x)) # ad perf hack -- is unit tested
_vx = _ones(eltype(modxy), x, 1, length(y))
_vy = _ones(eltype(modxy), y, 1, length(x))
modx_wide = _mod(x) * _vx # ad perf hack -- is unit tested
mody_wide = _mod(y) * _vy # ad perf hack -- is unit tested
return _fbm.(modx_wide, mody_wide', modxy, only(κ.h))
end

Expand Down
4 changes: 2 additions & 2 deletions src/basekernels/matern.jl
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ end

Matern32Kernel(; metric=Euclidean()) = Matern32Kernel(metric)

kappa(::Matern32Kernel, d::Real) = (1 + sqrt(3) * d) * exp(-sqrt(3) * d)
kappa(::Matern32Kernel, d::T) where {T<:Real} = (1 + T(sqrt(3)) * d) * exp(-T(sqrt(3)) * d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't T(sqrt(3)), or at least sqrt(3) be precalculated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could use

Suggested change
kappa(::Matern32Kernel, d::T) where {T<:Real} = (1 + T(sqrt(3)) * d) * exp(-T(sqrt(3)) * d)
function kappa(::Matern32Kernel, d::Real)
x = sqrt3 * d
return (1 + x) * exp(-x)
end

and load IrrationalConstants.sqrt3 in src/KernelFunctions.jl


metric(k::Matern32Kernel) = k.metric

Expand Down Expand Up @@ -108,7 +108,7 @@ end

Matern52Kernel(; metric=Euclidean()) = Matern52Kernel(metric)

kappa(::Matern52Kernel, d::Real) = (1 + sqrt(5) * d + 5 * d^2 / 3) * exp(-sqrt(5) * d)
kappa(::Matern52Kernel, d::T) where {T<:Real} = (1 + T(sqrt(5)) * d + 5 * d^2 / 3) * exp(-T(sqrt(5)) * d)
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with sqrt(5)

Copy link
Member

@devmotion devmotion Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, IrrationalConstants doesn't define sqrt5 (maybe should be added?) but one could still improve it a bit:

Suggested change
kappa(::Matern52Kernel, d::T) where {T<:Real} = (1 + T(sqrt(5)) * d + 5 * d^2 / 3) * exp(-T(sqrt(5)) * d)
function kappa(::Matern52Kernel, d::Real)
x = sqrt(oftype(d, 5))
return (1 + x + 5 * d^2 / 3) * exp(-x)
end

This would also be a bit safer as it also works if d is not a floating point number.


metric(k::Matern52Kernel) = k.metric

Expand Down
8 changes: 4 additions & 4 deletions src/basekernels/wiener.jl
Original file line number Diff line number Diff line change
Expand Up @@ -62,22 +62,22 @@ function (::WienerKernel{1})(x, y)
X = sqrt(sum(abs2, x))
Y = sqrt(sum(abs2, y))
minXY = min(X, Y)
return 1 / 3 * minXY^3 + 1 / 2 * minXY^2 * euclidean(x, y)
return minXY^3 / 3 + minXY^2 * euclidean(x, y) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catches here!

end

function (::WienerKernel{2})(x, y)
X = sqrt(sum(abs2, x))
Y = sqrt(sum(abs2, y))
minXY = min(X, Y)
return 1 / 20 * minXY^5 + 1 / 12 * minXY^3 * euclidean(x, y) * (X + Y - 1 / 2 * minXY)
return minXY^5 / 20 + minXY^3 * euclidean(x, y) * (X + Y - minXY / 2) / 12
end

function (::WienerKernel{3})(x, y)
X = sqrt(sum(abs2, x))
Y = sqrt(sum(abs2, y))
minXY = min(X, Y)
return 1 / 252 * minXY^7 +
1 / 720 * minXY^4 * euclidean(x, y) * (5 * max(X, Y)^2 + 2 * X * Y + 3 * minXY^2)
return minXY^7 / 252 +
minXY^4 * euclidean(x, y) * (5 * max(X, Y)^2 + 2 * X * Y + 3 * minXY^2) / 720
end

function Base.show(io::IO, ::WienerKernel{I}) where {I}
Expand Down
Loading