Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA/GPU support #17

Merged
merged 10 commits into from
May 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ authors = ["S J Palmer <samuelpalmer94@gmail.com>"]
version = "0.1.1"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
FFTW = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ Solve for...
* Inhomogeneous permittivity and/or permeability
* Chern numbers of topological photonic crystals using [built-in Wilson loop methods](https://sp94.github.io/Peacock.jl/dev/how-tos/wilson_loops)


Focused on ease of use
* [Install](https://sp94.github.io/Peacock.jl/dev/tutorials/getting_started/#getting_started_installation-1) with one line in Julia's package manager
* Simple visualisation of [geometry](https://sp94.github.io/Peacock.jl/dev/tutorials/getting_started/#getting_started_geometry-1), [fields](https://sp94.github.io/Peacock.jl/dev/tutorials/getting_started/#getting_started_modes-1), and [fully labelled band diagrams](https://sp94.github.io/Peacock.jl/dev/tutorials/getting_started/#getting_started_bands-1)
* Reproduce and extend existing photonic crystals in the [`Peacock.Zoo`](https://sp94.github.io/Peacock.jl/dev/how-tos/zoo/#how_to_zoo-1) submodule
* Easily [accelerate calculations on CUDA-compatible GPUs](https://sp94.github.io/Peacock.jl/dev/how-tos/zoo/#how_to_GPU-1)


## Limitations
Expand All @@ -42,6 +42,12 @@ Focused on ease of use
* Like all methods that solve Maxwell's equations in Fourier space, the Plane Wave Expansion Method converges slowly for high contrast materials such as metals (ϵ < 0)


## Contributors

* @sp94 (core)
* @kabume (GPU support)


## Referencing

If you use `Peacock.jl` in your work, please consider citing us as
Expand Down
54 changes: 54 additions & 0 deletions benchmark/benchmark_GPU.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
using PyPlot
using Peacock

function epf(x,y)
# equation of a circle with radius 0.2a
if x^2+y^2 <= 0.2^2
# dielectric inside the circle
return 8.9
else
# air outside the circle
return 1
end
end

# Permeability is unity everywhere
function muf(x,y)
return 1
end

a1 = [1, 0] # first lattice vector
a2 = [0, 1] # second lattice vector
d1 = 0.01 # resolution along first lattice vector
d2 = 0.01 # resolution along second lattice vector
geometry = Geometry(epf, muf, a1, a2, d1, d2)
fourier_space_cutoff = 9
solver_CPU = Solver(geometry, fourier_space_cutoff)
solver_GPU = Solver(geometry, fourier_space_cutoff, GPU=true)
G = BrillouinZoneCoordinate( 0, 0, "Γ")
X = BrillouinZoneCoordinate(1/2, 0, "X")
M = BrillouinZoneCoordinate(1/2, 1/2, "M")
ks = [G,X,M,G]

@show fourier_space_cutoff
@show typeof(solver_CPU.epc)
@show typeof(solver_GPU.muc)


function test_bands(solver, dk)
figure(figsize=(4,3))
plot_band_diagram(solver, ks, TE, color="red",
bands=1:4, dk=dk, frequency_scale=1/2pi)
plot_band_diagram(solver, ks, TM, color="blue",
bands=1:4, dk=dk, frequency_scale=1/2pi)
ylim(0,0.8)
end

# call once to make sure functions are compiled
test_bands(solver_CPU, 2)
test_bands(solver_GPU, 2)

close("all")
@time test_bands(solver_CPU, 0.1)
@time test_bands(solver_GPU, 0.1)
show()
Binary file added docs/src/figures/CPU_vs_GPU_benchmark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 77 additions & 0 deletions docs/src/how-tos/gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# [How to accelerate calculations using CUDA-compatible GPUs](@id how_to_GPU)

By default, `Peacock.jl` uses the CPU. However, you may be able to accelerate your calculations if you have a CUDA-compatible GPU.

Let's begin by defining a simple photonic crystal.
```julia
using Peacock, PyPlot

function epf(x,y)
# equation of a circle with radius 0.2a
if x^2+y^2 <= 0.2^2
# dielectric inside the circle
return 8.9
else
# air outside the circle
return 1
end
end

# Permeability is unity everywhere
function muf(x,y)
return 1
end

a1 = [1, 0] # first lattice vector
a2 = [0, 1] # second lattice vector
d1 = 0.01 # resolution along first lattice vector
d2 = 0.01 # resolution along second lattice vector
geometry = Geometry(epf, muf, a1, a2, d1, d2)
```

When we construct the `Solver` from the `Geometry`, we can pass the `GPU` flag. By default, `GPU=false`.
```julia
fourier_space_cutoff = 9 # larger = more accurate, slower
solver_CPU = Solver(geometry, fourier_space_cutoff)
solver_GPU = Solver(geometry, fourier_space_cutoff, GPU=true)
```

The fields of the `solver_CPU` are standard Julia arrays, but the fields of the `solver_GPU` are CUDA arrays which will utilise the GPU.
```julia
typeof(solver_CPU.epc) == Array{Complex{Float64},2}
typeof(solver_GPU.epc) == CUDA.CuArray{Complex{Float64},2}
```

Now, let's compare the time to solve and plot some bands with and without the GPU.
```julia
function plot_example_bands(solver, dk)
G = BrillouinZoneCoordinate( 0, 0, "Γ")
X = BrillouinZoneCoordinate(1/2, 0, "X")
M = BrillouinZoneCoordinate(1/2, 1/2, "M")
ks = [G,X,M,G]
figure(figsize=(4,3))
plot_band_diagram(solver, ks, TE, color="red",
bands=1:4, dk=dk, frequency_scale=1/2pi)
plot_band_diagram(solver, ks, TM, color="blue",
bands=1:4, dk=dk, frequency_scale=1/2pi)
ylim(0,0.8)
end

# call once to make sure functions are compiled
plot_example_bands(solver_CPU, 2)
plot_example_bands(solver_GPU, 2)

# time CPU vs GPU
close("all")
@time plot_example_bands(solver_CPU, 0.1)
@time plot_example_bands(solver_GPU, 0.1)
show()
```

We find a significant speed up using the GPU - ~13.6 seconds vs ~3.6 seconds.
![](../figures/CPU_vs_GPU_benchmark.png)


## Further reading

- [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl): CUDA programming in Julia
1 change: 1 addition & 0 deletions src/Peacock.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ module Peacock

using PyPlot
using LinearAlgebra, FFTW
using CUDA

include("utils.jl")

Expand Down
48 changes: 33 additions & 15 deletions src/solver.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,39 @@ Transverse electric (TE) or transverse magnetic (TM) polarisation.
"""
@enum Polarisation TE TM


"""
Solver(basis::PlaneWaveBasis, epc::Matrix{ComplexF64}, muc::Matrix{ComplexF64})

A plane-wave expansion method solver, where `epc` and `muc` are the convolution
matrices of the permittivity and permeability, respectively, for the given
`basis` of plane waves.
"""
struct Solver
struct Solver{T}
basis::PlaneWaveBasis
epc::Matrix{ComplexF64}
muc::Matrix{ComplexF64}
epc::T
muc::T
Kx::T
Ky::T
end


"""
Solver(geometry::Geometry, basis::PlaneWaveBasis)
Solver(geometry::Geometry, basis::PlaneWaveBasis; GPU::Bool=false)

Approximate the geometry using the given `basis` of plane waves.
"""
function Solver(geometry::Geometry, basis::PlaneWaveBasis)
function Solver(geometry::Geometry, basis::PlaneWaveBasis; GPU=false)
epc = convmat(geometry.ep, basis)
muc = convmat(geometry.mu, basis)
return Solver(basis, epc, muc)
Kx = diagm(Complex.(basis.kxs))
Ky = diagm(Complex.(basis.kys))
if GPU
epc = CuArray(epc)
muc = CuArray(muc)
Kx = CuArray(Kx)
Ky = CuArray(Ky)
end
return Solver(basis, epc, muc, Kx, Ky)
end


Expand All @@ -39,9 +48,9 @@ The circle has a diameter of `cutoff` Brillouin zones. Increasing the `cutoff`
will increase the number of plane waves leading to a more accurate solution.
It is assumed that `norm(b1) == norm(b2)`.
"""
function Solver(geometry::Geometry, cutoff::Int)
function Solver(geometry::Geometry, cutoff::Int; GPU=false)
basis = PlaneWaveBasis(geometry, cutoff)
return Solver(geometry, basis)
return Solver(geometry, basis, GPU=GPU)
end


Expand All @@ -53,9 +62,9 @@ Approximate the geometry using a basis of plane waves truncated in a rhombus.
The rhombus has lengths `cutoff_b1` and `cutoff_b2` in the `b1` and `b2`
directions, respectively.
"""
function Solver(geometry::Geometry, cutoff_b1::Int, cutoff_b2::Int)
function Solver(geometry::Geometry, cutoff_b1::Int, cutoff_b2::Int; GPU=false)
basis = PlaneWaveBasis(geometry, cutoff_b1, cutoff_b2)
return Solver(geometry, basis)
return Solver(geometry, basis, GPU=GPU)
end


Expand Down Expand Up @@ -91,11 +100,13 @@ end
Calculate the eigenmodes of a photonic crystal at position `k` in reciprocal space.
"""
function solve(solver::Solver, k::AbstractVector{<:Real}, polarisation::Polarisation; bands=:)
# Get left and right hand sides of the generalised eigenvalue problem

basis = solver.basis
epc, muc = solver.epc, solver.muc
Kx = DiagonalMatrix(basis.kxs) + k[1]*I
Ky = DiagonalMatrix(basis.kys) + k[2]*I
Kx = solver.Kx + k[1]*I
Ky = solver.Ky + k[2]*I

# Get left and right hand sides of the generalised eigenvalue problem
if polarisation == TE
LHS = Kx/epc*Kx + Ky/epc*Ky
RHS = muc
Expand All @@ -105,6 +116,12 @@ function solve(solver::Solver, k::AbstractVector{<:Real}, polarisation::Polarisa
RHS = epc
label = L"E_z"
end

# LHS and RHS may be CUDA (GPU) arrays
# Convert back to standard Julia arrays to perform eigen on the CPU
LHS = Array(LHS)
RHS = Array(RHS)

# Sometimes the generalised eigenvalue problem solver
# fails near Γ when the crystals are symmetric.
# In these cases, rewrite as a regular eigenvalue problem
Expand All @@ -113,6 +130,7 @@ function solve(solver::Solver, k::AbstractVector{<:Real}, polarisation::Polarisa
catch
eigen(RHS \ LHS)
end

freqs = sqrt.(freqs_squared)
# Sort by increasing frequency
idx = sortperm(freqs, by=real)
Expand All @@ -134,7 +152,7 @@ end

Calculate the eigenmodes of a photonic crystal at position `k=x.p*b1 + x.q*b2` in reciprocal space.
"""
function solve(solver::Solver, x::BrillouinZoneCoordinate, polarisation::Polarisation, bands=:)
function solve(solver::Solver, x::BrillouinZoneCoordinate, polarisation::Polarisation; bands=:)
k = get_k(x, solver.basis)
return solve(solver, k, polarisation, bands=bands)
end
20 changes: 10 additions & 10 deletions src/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ function bs_to_as(b1, b2)
end


"""
DiagonalMatrix(diag::AbstractVector{ComplexF64})

A sparse diagonal matrix that can be used in left division (D \\ X)
"""
struct DiagonalMatrix <: AbstractMatrix{ComplexF64}
diag::AbstractVector{ComplexF64}
end
Base.size(A::DiagonalMatrix) = (length(A.diag), length(A.diag))
Base.getindex(A::DiagonalMatrix, I::Vararg{Int,2}) = I[1]==I[2] ? A.diag[I[1]] : 0
# """
# DiagonalMatrix(diag::AbstractVector{ComplexF64})

# A sparse diagonal matrix that can be used in left division (D \\ X)
# """
# struct DiagonalMatrix <: AbstractMatrix{ComplexF64}
# diag::AbstractVector{ComplexF64}
# end
# Base.size(A::DiagonalMatrix) = (length(A.diag), length(A.diag))
# Base.getindex(A::DiagonalMatrix, I::Vararg{Int,2}) = I[1]==I[2] ? A.diag[I[1]] : 0