Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Attempt 2 #472

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice!

- label: "Julia v1"
plugins:
- JuliaCI/julia#v1:
version: "1"
- JuliaCI/julia-test#v1: ~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- JuliaCI/julia-test#v1: ~
- JuliaCI/julia-test#v1: ~
- JuliaCI/julia-coverage#v1:
codecov: true

agents:
queue: "juliagpu"
cuda: "*"
if: build.message !~ /\[skip tests\]/
timeout_in_minutes: 60

env:
GROUP: GPU
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ uuid = "ec8451be-7e33-11e9-00cf-bbf324bd1392"
version = "0.10.43"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
CompositionsBase = "a33af91c-f02d-484b-be07-31d278c5ca2b"
Expand All @@ -25,6 +26,7 @@ ZygoteRules = "700de1a5-db45-46bc-99cf-38207098b444"
ChainRulesCore = "1"
Compat = "3.7, 4"
CompositionsBase = "0.1"
CUDA = "3"
Distances = "0.10"
FillArrays = "0.10, 0.11, 0.12, 0.13"
Functors = "0.1, 0.2"
Expand Down
2 changes: 2 additions & 0 deletions src/KernelFunctions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ export IndependentMOKernel,
export tensor, ⊗, compose

using Compat
using CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless things changed, that's quite a massive dependency no? I saw CuArrays.jl got deprecated, does this mean that CUDA is light now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already depend on Requires, maybe it could be hidden behind a @require?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, maybe the lightweight GPUArraysCore (https://github.com/JuliaGPU/GPUArrays.jl/tree/master/lib/GPUArraysCore) is sufficient?

using ChainRulesCore: ChainRulesCore, Tangent, ZeroTangent, NoTangent
using ChainRulesCore: @thunk, InplaceableThunk
using CompositionsBase
Expand Down Expand Up @@ -75,6 +76,7 @@ include("distances/pairwise.jl")
include("distances/dotproduct.jl")
include("distances/delta.jl")
include("distances/sinus.jl")
include("distances/cuda.jl")

include("transform/transform.jl")
include("transform/scaletransform.jl")
Expand Down
112 changes: 101 additions & 11 deletions src/TestUtils.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
module TestUtils

using CUDA
using Distances
using FillArrays
using LinearAlgebra
using KernelFunctions
using Random
using Test
using InteractiveUtils

"""
test_interface(
Expand All @@ -20,9 +23,9 @@ Run various consistency checks on `k` at the inputs `x0`, `x1`, and `x2`.
`x0` and `x1` should be of the same length with different values, while `x0` and `x2` should
be of different lengths.

These tests are intended to pick up on really substantial issues with a kernel implementation
(e.g. substantial asymmetry in the kernel matrix, large negative eigenvalues), rather than to
test the numerics in detail, which can be kernel-specific.
These tests are intended to pick up on really substantial issues with a kernel
implementation (e.g. substantial asymmetry in the kernel matrix, large negative
eigenvalues), rather than to test the numerics in detail, which can be kernel-specific.
"""
function test_interface(
k::Kernel,
Expand Down Expand Up @@ -118,6 +121,18 @@ function test_interface(
)
end

function test_interface(
rng::AbstractRNG, k::Kernel, ::Type{<:ColVecs{T, CuMatrix{T}}}; dim_in=2, kwargs...
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
) where {T<:Real}
return test_interface(
k,
_to_cuda_gpu(ColVecs(randn(rng, T, dim_in, 11))),
_to_cuda_gpu(ColVecs(randn(rng, T, dim_in, 11))),
_to_cuda_gpu(ColVecs(randn(rng, T, dim_in, 13)));
kwargs...,
)
end

function test_interface(
rng::AbstractRNG, k::Kernel, ::Type{<:RowVecs{T}}; dim_in=2, kwargs...
) where {T<:Real}
Expand Down Expand Up @@ -176,27 +191,102 @@ function test_interface(k::Kernel, T::Type{<:Real}=Float64; kwargs...)
return test_interface(Random.GLOBAL_RNG, k, T; kwargs...)
end

const FloatType = Union{Float32, Float64}
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not AbstractFloat?


"""
example_inputs(rng::AbstractRNG, type)

Return a tuple of 4 inputs of type `type`. See `methods(example_inputs)` for information
around supported types. It is recommended that you utilise `StableRNGs.jl` for `rng` here
to ensure consistency across Julia versions.
"""
function example_inputs(rng::AbstractRNG, ::Type{Vector{Float64}})
return map(n -> randn(rng, Float64, n), (1, 2, 3, 4))
function example_inputs(rng::AbstractRNG, ::Type{Vector{T}}) where {T<:FloatType}
return map(n -> randn(rng, T, n), (1, 2, 3, 4))
end

function example_inputs(
rng::AbstractRNG, ::Type{ColVecs{Float64,Matrix{Float64}}}; dim::Int=2
)
return map(n -> ColVecs(randn(rng, dim, n)), (1, 2, 3, 4))
rng::AbstractRNG, ::Type{ColVecs{T,Matrix{T}}}; dim::Int=2
) where {T<:FloatType}
return map(n -> ColVecs(randn(rng, T, dim, n)), (1, 2, 3, 4))
end

function example_inputs(
rng::AbstractRNG, ::Type{RowVecs{Float64,Matrix{Float64}}}; dim::Int=2
)
return map(n -> RowVecs(randn(rng, n, dim)), (1, 2, 3, 4))
rng::AbstractRNG, ::Type{RowVecs{T,Matrix{T}}}; dim::Int=2
) where {T<:FloatType}
return map(n -> RowVecs(randn(rng, T, n, dim)), (1, 2, 3, 4))
end

function example_inputs(rng::AbstractRNG, ::Type{CuVector{T}}) where {T<:FloatType}
return map(_to_cuda_gpu, example_inputs(rng, Vector{T}))
end

function example_inputs(
rng::AbstractRNG, ::Type{ColVecs{T, CuMatrix{T}}}; dim::Int=2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
) where {T<:FloatType}
return map(_to_cuda_gpu, example_inputs(rng, ColVecs{T, Matrix{T}}; dim=dim))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

function example_inputs(
rng::AbstractRNG, ::Type{RowVecs{T, CuMatrix{T}}}; dim::Int=2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
) where {T<:FloatType}
return map(_to_cuda_gpu, example_inputs(rng, RowVecs{T, Matrix{T}}; dim=dim))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

function test_gpu_against_cpu(k::Kernel, x1::AbstractVector, x2::AbstractVector; atol=1e-6)
@assert length(x1) != length(x2)

k_cpu = _to_cpu(k)
x1_cpu = _to_cpu(x1)
x2_cpu = _to_cpu(x2)
let
K_cpu = kernelmatrix(k_cpu, x1_cpu)
K_gpu = kernelmatrix(k, x1)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
let
K_cpu = kernelmatrix(k_cpu, x1_cpu, x2_cpu)
K_gpu = kernelmatrix(k, x1, x2)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
let
K_cpu = kernelmatrix_diag(k_cpu, x1_cpu)
K_gpu = kernelmatrix_diag(k, x1)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
let
K_cpu = kernelmatrix_diag(k_cpu, x1_cpu, x1_cpu)
K_gpu = kernelmatrix_diag(k, x1, x1)
@test size(K_cpu) == size(K_gpu)
@test eltype(K_cpu) == eltype(K_gpu)
@test isapprox(K_cpu, _to_cpu(K_gpu); atol=atol)
end
end

function test_gpu_against_cpu(rng::AbstractRNG, k::Kernel, data_type::Type)
_, x1, x2, _ = example_inputs(rng, data_type)
test_gpu_against_cpu(k, x1, x2)
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

_to_cpu(x::CuArray{<:Real}) = Array(x)
_to_cpu(x::ColVecs{<:Real, <:CuMatrix}) = ColVecs(_to_cpu(x.X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just overload cpu on our type? (or cpu is part of Flux? I don't remember...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use/define Adapt.adapt? And let users pass the types they want to adapt it to?
According to the README of Adapt, we probably would want to define

Adapt.adapt_structure(to, x::ColVecs{<:Real}) = ColVecs(adapt(to, x.X))
Adapt.adapt_structure(to, x::RowVecs{<:Real}) = RowVecs(adapt(to, x.X))

And then one could use e.g. adapt(CuArray, x) or adapt(Array, x) where x could be a ColVecs{<:Real, Matrix{<:Real}}/RowVecs{<:Real,Matrix{<:Real}} or a ColVecs{<:Real,<:CuMatrix}/RowVecs{<:Real,<:CuMatrix} (for the other direction). If the testing function allows to specify the desired type (such as Array or CuArray) users/developers could pass it and we don't have to deal with it in KernelFunctions here, maybe?

_to_cpu(x::RowVecs{<:Real, <:CuMatrix}) = RowVecs(_to_cpu(x.X))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
_to_cpu(x::FillArrays.AbstractFill{<:Real}) = x

_to_cpu(k::Kernel) = k
_to_cpu(k::TransformedKernel) = TransformedKernel(_to_cpu(k.kernel), _to_cpu(k.transform))
_to_cpu(t::Transform) = t
_to_cpu(t::LinearTransform) = LinearTransform(_to_cpu(t.A))
_to_cpu(t::ChainTransform) = ChainTransform(map(_to_cpu, t.transforms))

_to_cuda_gpu(x::Array{<:Real}) = CuArray(x)
_to_cuda_gpu(x::ColVecs{T, Matrix{T}} where {T<:Real}) = ColVecs(_to_cuda_gpu(x.X))
_to_cuda_gpu(x::RowVecs{T, Matrix{T}} where {T<:Real}) = RowVecs(_to_cuda_gpu(x.X))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
_to_cuda_gpu(x::FillArrays.AbstractFill{<:Real}) = x

end # module
20 changes: 19 additions & 1 deletion src/basekernels/exponential.jl
Original file line number Diff line number Diff line change
Expand Up @@ -137,10 +137,28 @@ end

@functor GammaExponentialKernel

kappa(κ::GammaExponentialKernel, d::Real) = exp(-d^only(κ.γ))
_gamma_kappa(γ::Real, d::Real) = exp(-d^γ)

kappa(κ::GammaExponentialKernel, d::Real) = _gamma_kappa(κ.γ, d)

metric(k::GammaExponentialKernel) = k.metric

function kernelmatrix(k::GammaExponentialKernel, x::AbstractVector)
return _gamma_kappa.(only(k.γ), pairwise(metric(k), x))
end

function kernelmatrix(k::GammaExponentialKernel, x::AbstractVector, y::AbstractVector)
return _gamma_kappa.(only(k.γ), pairwise(metric(k), x, y))
end

function kernelmatrix_diag(k::GammaExponentialKernel, x::AbstractVector)
return _gamma_kappa.(only(k.γ), colwise(metric(k), x))
end

function kernelmatrix_diag(k::GammaExponentialKernel, x::AbstractVector, y::AbstractVector)
return _gamma_kappa.(only(k.γ), colwise(metric(k), x, y))
end

iskroncompatible(::GammaExponentialKernel) = true

function Base.show(io::IO, κ::GammaExponentialKernel)
Expand Down
18 changes: 15 additions & 3 deletions src/basekernels/fbm.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,19 @@ _mod(x::AbstractVector{<:AbstractVector{<:Real}}) = sum.(abs2, x)
_mod(x::ColVecs) = vec(sum(abs2, x.X; dims=1))
_mod(x::RowVecs) = vec(sum(abs2, x.X; dims=2))

_ones(el_type, x::AbstractVector, sz...) = ones(el_type, sz...)
function _ones(
el_type,
x::Union{CuVector{<:Real}, ColVecs{<:Real, <:CuMatrix}, RowVecs{<:Real, <:CuMatrix}},
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
sz...,
)
return CUDA.ones(el_type, sz...)
end
Comment on lines +51 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use adapt here?


function kernelmatrix(κ::FBMKernel, x::AbstractVector)
modx = _mod(x)
modx_wide = modx * ones(eltype(modx), 1, length(modx)) # ad perf hack -- is unit tested
_v = _ones(eltype(modx), x, 1, length(modx))
modx_wide = modx * _v # ad perf hack -- is unit tested
modxx = pairwise(SqEuclidean(), x)
return _fbm.(modx_wide, modx_wide', modxx, only(κ.h))
end
Expand All @@ -64,8 +74,10 @@ end

function kernelmatrix(κ::FBMKernel, x::AbstractVector, y::AbstractVector)
modxy = pairwise(SqEuclidean(), x, y)
modx_wide = _mod(x) * ones(eltype(modxy), 1, length(y)) # ad perf hack -- is unit tested
mody_wide = _mod(y) * ones(eltype(modxy), 1, length(x)) # ad perf hack -- is unit tested
_vx = _ones(eltype(modxy), x, 1, length(y))
_vy = _ones(eltype(modxy), y, 1, length(x))
modx_wide = _mod(x) * _vx # ad perf hack -- is unit tested
mody_wide = _mod(y) * _vy # ad perf hack -- is unit tested
return _fbm.(modx_wide, mody_wide', modxy, only(κ.h))
end

Expand Down
123 changes: 123 additions & 0 deletions src/distances/cuda.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@

const CuSubArray{T} = SubArray{T, <:Any, <:CuArray}
const CuColVecs{T} = ColVecs{T, <:Union{CuMatrix{T}, CuSubArray{T}}}
const CuRowVecs{T} = RowVecs{T, <:Union{CuMatrix{T}, CuSubArray{T}}}
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved

const _CuVector{T} = Union{CuVector{T}, SubArray{T, 1, <:CuArray}}
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +2 to +6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to restrict it to these types? Could we just define our pairwise (IIRC we own it?) in this way for generic AbstractVector/ColVecs/RowVecs, and only forward the Vector{<:Real}/ColVecs{<:Real,Matrix{<:Real}}/... to Distances?


# SqEuclidean

# Vector{<:Real}

pairwise(::SqEuclidean, x::_CuVector{<:Real}) = (x .- x').^2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved

pairwise(::SqEuclidean, x::_CuVector{T}, y::_CuVector{T}) where {T<:Real} = (x .- y').^2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved

colwise(::SqEuclidean, x::_CuVector{T}) where {T<:Real} = CUDA.zeros(T, length(x))

colwise(::SqEuclidean, x::_CuVector{T}, y::_CuVector{T}) where {T<:Real} = (x .- y).^2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved

# ColVecs

function pairwise(::SqEuclidean, x::CuColVecs{<:Real})
X = CuArray(x.X) # needed to handle subarrays
X_ = sum(abs2, X; dims=1)
return X_ .+ X_' .- 2 .* (X' * X)
end

function pairwise(::SqEuclidean, x::CuColVecs{T}, y::CuColVecs{T}) where {T<:Real}
X = CuArray(x.X) # needed to handle subarrays
Y = CuArray(y.X) # needed to handle subarrays
X_ = sum(abs2, X; dims=1)
Y_ = sum(abs2, Y; dims=1)
return X_' .+ Y_ .- 2 .* (X' * Y)
end

function colwise(::SqEuclidean, x::CuColVecs{T}) where {T<:Real}
return CUDA.zeros(T, length(x))
end

function colwise(::SqEuclidean, x::CuColVecs{T}, y::CuColVecs{T}) where {T<:Real}
return vec(sum(abs2, x.X - y.X; dims=1))
end

# RowVecs

function pairwise(::SqEuclidean, x::CuRowVecs{<:Real})
X = CuArray(x.X)
X_ = sum(abs2, X; dims=2)
return X_' .+ X_ .- 2 .* (X * X')
end

function pairwise(::SqEuclidean, x::CuRowVecs{T}, y::CuRowVecs{T}) where {T<:Real}
X = CuArray(x.X) # needed to handle subarrays
Y = CuArray(y.X) # needed to handle subarrays
X_ = sum(abs2, X; dims=2)
Y_ = sum(abs2, Y; dims=2)
return X_ .+ Y_' .- 2 .* (X * Y')
end

colwise(::SqEuclidean, x::CuRowVecs{T}) where {T<:Real} = CUDA.zeros(T, length(x))

function colwise(::SqEuclidean, x::CuRowVecs{T}, y::CuRowVecs{T}) where {T<:Real}
return vec(sum(abs2, x.X - y.X; dims=2))
end

# Eucldiean: Derive from SqEuclidean implementations.

const GPUVecType{T} = Union{_CuVector{T}, CuColVecs{T}, CuRowVecs{T}}
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved

pairwise(::Euclidean, x::GPUVecType{<:Real}) = sqrt.(pairwise(SqEuclidean(), x))

function pairwise(::Euclidean, x::GPUVecType{T}, y::GPUVecType{T}) where {T<:Real}
return sqrt.(pairwise(SqEuclidean(), x, y))
end

colwise(::Euclidean, x::GPUVecType{T}) where {T<:Real} = CUDA.zeros(T, length(x))

function colwise(::Euclidean, x::GPUVecType{T}, y::GPUVecType{T}) where {T<:Real}
return sqrt.(colwise(SqEuclidean(), x, y))
end

#
# DotProduct
#

# Vector
pairwise(::DotProduct, x::_CuVector{<:Real}) = x * x'

pairwise(::DotProduct, x::_CuVector{T}, y::_CuVector{T}) where {T<:Real} = x * y'

colwise(::DotProduct, x::_CuVector{<:Real}) = x.^2
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved

colwise(::DotProduct, x::_CuVector{T}, y::_CuVector{T}) where {T<:Real} = x .* y

# ColVecs
pairwise(::DotProduct, x::CuColVecs{<:Real}) = x.X' * x.X

function pairwise(::DotProduct, x::CuColVecs{T}, y::CuColVecs{T}) where {T<:Real}
return x.X' * y.X
end

function colwise(::DotProduct, x::CuColVecs{<:Real})
return vec(sum(x.X.^2; dims=1))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

function colwise(::DotProduct, x::CuColVecs{T}, y::CuColVecs{T}) where {T<:Real}
return vec(sum(x.X .* y.X; dims=1))
end

# RowVecs
pairwise(::DotProduct, x::CuRowVecs{<:Real}) = x.X * x.X'

function pairwise(::DotProduct, x::CuRowVecs{T}, y::CuRowVecs{T}) where {T<:Real}
return x.X * y.X'
end

function colwise(::DotProduct, x::CuRowVecs{<:Real})
return vec(sum(x.X.^2; dims=2))
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
end

function colwise(::DotProduct, x::CuRowVecs{T}, y::CuRowVecs{T}) where {T<:Real}
return vec(sum(x.X .* y.X; dims=2))
end
1 change: 1 addition & 0 deletions src/kernels/kernelproduct.jl
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Base.length(k::KernelProduct) = length(k.kernels)
(κ::KernelProduct)(x, y) = prod(k(x, y) for k in κ.kernels)

function kernelmatrix(κ::KernelProduct, x::AbstractVector)
# return reduce(hadamard, map(Base.Fix2(kernelmatrix, x), κ.kernels))
return reduce(hadamard, kernelmatrix(k, x) for k in κ.kernels)
end

Expand Down
Loading