Skip to content

Commit

Permalink
more documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
hypermania committed Aug 4, 2024
1 parent 50cba1a commit 9db53fb
Show file tree
Hide file tree
Showing 8 changed files with 234 additions and 43 deletions.
146 changes: 137 additions & 9 deletions doc/writing_new_equation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Writing your own equation
# Implementing your own equation

Here we give an example of adding a field equation with \f$ \kappa \varphi^6 \f$ interaction to the codebase.
\f[ \ddot{\varphi} - \nabla^2 \varphi + m^2 \varphi + \kappa \varphi^5 = 0 \f]
Expand Down Expand Up @@ -27,8 +27,8 @@ The equation class also has a reference to a `workspace`, so that it has access


The most important function here is the `operator()`.
When this function is called, it computes the time derivative of the state vector \f$ x \f$ at time \f$ t \f$, and stores it to \f$ dxdt \f$.
Implementing this function is the minimal requirement for a class to work with odeint.
When this function is called, it computes the time derivative of the state vector `x` at time `t`, and stores it to `dxdt`.
Implementing this function is the minimal requirement for a class to work with the odeint library.
To do this, we can simply copy the implementation for `KleinGordonEquation::operator()` and add a \f$ \kappa \varphi^5 \f$ term to it:
```{.cpp}
void KappaEquation::operator()(const State &x, State &dxdt, const double t)
Expand Down Expand Up @@ -66,21 +66,149 @@ Note the extra line `kappa * pow(x(seqN(IDX_OF(N, a, b, 0), N)), 5)` giving the


## Adding the coupling parameter in workspace
The code given above won't compile since `workspace.kappa` doesn't exist yet.
The code given above won't compile yet since `workspace.kappa` doesn't exist.
To make the code compile, add a new field in `WorkspaceGeneric`:
```{.cpp}
template<typename Vector>
struct WorkspaceGeneric {
// Stuff
double kappa;
// Stuff
};
```
As a general paradigm, we put data (e.g. coupling parameters, temporary variables) needed to solve the equation in a `Workspace`.
Note that different equations use the same `Workspace`, and the field names (e.g. `kappa`) can mean different things for different equations.
You are responsible for ensuring that your modification on `Workspace` doesn't introduce bugs for other equations.
To avoid accidently introducing a bug, you are advised to add new fields for new parameters / temporary objects.


## Setting `workspace.kappa` from a parameter struct
Now suppose you define a new parameter class:
```{.cpp}
struct KappaParam {
// The usual
double kappa;
};
KappaParam param;
```
If you try calling the constructor `Workspace(param, initializer)`, `workspace.kappa` would not be automatically set to `param.kappa`.
To resolve this, add the following in `workspace.hpp`:
```{.cpp}
template<typename Param>
concept HasKappa = requires (Param param) { TYPE_REQUIREMENT(param.kappa, double) };
// ...
## Adding a new parameter
You will need to define a new parameter struct that contains the new \f$ \kappa \f$ parameter.
WorkspaceGeneric(const Param &param, auto &initializer) :
N(param.N), L(param.L), fft_wrapper(param.N)
{
if constexpr(HasKappa<Param>) { kappa = param.kappa; }
// ...
}
```
This piece of code uses concept `HasKappa` to detect if `param.kappa` exists or not, and set `workspace.kappa` to `param.kappa` in the case it exists.
Having `KappaParam` is useful since it works with the utilities in `param.h`.


## Add a function to compute energy density
In order to save density spectrum, you would also want to implement a function to calculate energy density profile.
Again we can imitate the implementation for `KleinGordonEquation`:
```{.cpp}
struct KappaEquation {
// ...
static Vector compute_energy_density(const Workspace &workspace, const double t);
};
KappaEquation::Vector KappaEquation::compute_energy_density(const Workspace &workspace, const double t)
{
using namespace Eigen;
const long long int N = workspace.N;
const double L = workspace.L;
const double m = workspace.m;
const double kappa = workspace.kappa;
const double inv_h_sqr = 1.0 / ((L / N) * (L / N));
VectorXd rho(workspace.state.size() / 2);
for(long long int a = 0; a < N; ++a){
for(long long int b = 0; b < N; ++b){
rho(seqN(IDX_OF(N, a, b, 0), N)) = 0.5 *
( workspace.state(seqN(N*N*N+IDX_OF(N, a, b, 0), N)).cwiseAbs2()
+ m * m * workspace.state(seqN(IDX_OF(N, a, b, 0), N)).cwiseAbs2()
+ (1.0 / 6.0) * kappa * pow(workspace.state(seqN(IDX_OF(N, a, b, 0), N)), 6)
+ 0.25 * inv_h_sqr *
( (workspace.state(seqN(IDX_OF(N, (a+1)%N, b, 0), N))
- workspace.state(seqN(IDX_OF(N, (a+N-1)%N, b, 0), N))).cwiseAbs2()
+ (workspace.state(seqN(IDX_OF(N, a, (b+1)%N, 0), N))
- workspace.state(seqN(IDX_OF(N, a, (b+N-1)%N, 0), N))).cwiseAbs2() )
);
rho(seqN(IDX_OF(N, a, b, 1), N-2)) += 0.5 * 0.25 * inv_h_sqr *
(workspace.state(seqN(IDX_OF(N, a, b, 2), N-2))
- workspace.state(seqN(IDX_OF(N, a, b, 0), N-2))).cwiseAbs2();
rho(IDX_OF(N, a, b, 0)) += 0.5 * 0.25 * inv_h_sqr *
pow(workspace.state(IDX_OF(N, a, b, 1)) - workspace.state(IDX_OF(N, a, b, N-1)), 2);
rho(IDX_OF(N, a, b, N-1)) += 0.5 * 0.25 * inv_h_sqr *
pow(workspace.state(IDX_OF(N, a, b, 0)) - workspace.state(IDX_OF(N, a, b, N-2)), 2);
}
}
return rho;
}
```
Note the extra \f$ \kappa \varphi^6 / 6\f$ term in the function above.
Now `ConstIntervalObserver` knows how to compute the energy density for this theory.

## Using CUDA
You want your equation to work on GPU memory, so the state vector would be:
If you want your equation to run on GPU memory, then in `KappaEquation` the vector type should be set to:
```{.cpp}
typedef thrust::device_vector<double> Vector;
```
Your `operator()` should modify the state vector.
To do that, the easiest way is to write your own CUDA kernel.
Here, `thrust::device_vector<double>` is a class in the `thrust` library representing a double floating point array on the GPU.
Much like `std::vector<double>`, the class `thrust::device_vector<double>` takes care of GPU memory allocation / deallocation in an RAII manner, so that you don't have to call CUDA memory management API directly.
See [thrust documentation](https://nvidia.github.io/cccl/thrust/api/classthrust_1_1device__vector.html) for more details.

Your `operator()` and `compute_energy_density` functions must now work on GPU device vectors.
A straightforward way to do this is to write your own CUDA kernel.
Here's an example on how to do it:
```{.cpp}
__global__
void kappa_equation_kernel(const double *x, double *dxdt,
const double m, const double kappa,
const double inv_h_sqr,
const long long int N)
{
int a = blockIdx.x;
int b = blockIdx.y;
int c = threadIdx.x;
dxdt[IDX_OF(N, a, b, c)] = x[N*N*N+IDX_OF(N, a, b, c)];
dxdt[N*N*N+IDX_OF(N, a, b, c)] =
- m * m * x[IDX_OF(N, a, b, c)]
- kappa * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)]
+ inv_h_sqr * (-6.0 * x[IDX_OF(N, a, b, c)]
+ x[IDX_OF(N, (a+1)%N, b, c)]
+ x[IDX_OF(N, (a+N-1)%N, b, c)]
+ x[IDX_OF(N, a, (b+1)%N, c)]
+ x[IDX_OF(N, a, (b+N-1)%N, c)]
+ x[IDX_OF(N, a, b, (c+1)%N)]
+ x[IDX_OF(N, a, b, (c+N-1)%N)]);
}
void KappaEquation::operator()(const State &x, State &dxdt, const double t)
{
const long long int N = workspace.N;
const double L = workspace.L;
const double m = workspace.m;
const double kappa = workspace.kappa;
const double inv_h_sqr = 1.0 / ((L / N) * (L / N));
dim3 threadsPerBlock((int)N, 1);
dim3 numBlocks((int)N, (int)N);
kappa_equation_kernel<<<numBlocks, threadsPerBlock>>>(thrust::raw_pointer_cast(x.data()), thrust::raw_pointer_cast(dxdt.data()), m, kappa, inv_h_sqr, N);
}
```
Here `kappa_equation_kernel` is the CUDA kernel, and the `__global__` specifier means this function runs on the GPU.
`KappaEquation::operator()` invokes the kernel via `kappa_equation_kernel<<<numBlocks, threadsPerBlock>>>`.
Given the execution configuration `threadsPerBlock` and `numBlocks`, the function `kappa_equation_kernel` is executed once for each `a,b,c`, with `0 <= a,b,c < N`.
Depending on the kernel, different execution configurations could result in varying performance, or even introduce bugs.
See the [CUDA C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels) for more details.
Binary file modified documentation.pdf
Binary file not shown.
24 changes: 11 additions & 13 deletions src/cuda_wrapper.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ void copy_vector(Eigen::VectorXd &out, const thrust::device_vector<double> &in);

void show_gpu_memory_usage(void);

/*!
\brief Wrapper for 3D cufftPlan3d. Performs double to complex double FFT for a \f$ N^3 \f$ grid.
*/
// Wrapper for 3D cufftPlan3d. Performs double to complex double FFT for a \f$ N^3 \f$ grid.
struct cufftWrapperD2Z {
int N;
cufftHandle plan;
Expand All @@ -68,10 +66,7 @@ struct cufftWrapperD2Z {
};


/*!
\brief Wrapper for 3D cufftPlanMany.
Performs two double to complex double FFT for a \f$ N^3 \f$ grid.
*/
// Wrapper for 3D cufftPlanMany. Performs two double to complex double FFT for a \f$ N^3 \f$ grid.
struct cufftWrapperBatchedD2Z {
int N;
cufftHandle plan;
Expand All @@ -85,9 +80,10 @@ struct cufftWrapperBatchedD2Z {
cufftWrapperBatchedD2Z &operator=(cufftWrapperBatchedD2Z &&) = delete;
};

/*
\brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid.
Different cufft plans share the same work area so that GPU memory usage is minimized.
/*!
\brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid. Similar to fftwWrapper.
See <https://docs.nvidia.com/cuda/cufft/index.html>.
*/
struct cufftWrapper {
int N;
Expand All @@ -108,9 +104,11 @@ struct cufftWrapper {
cufftWrapper &operator=(cufftWrapper &&) = delete;
};

/*
\brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid.
Different cufft plans share the same work area so that GPU memory usage is minimized.
/*!
\brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid. Similar to fftwWrapper.
Uses less GPU memory than cufftWrapper.
See <https://docs.nvidia.com/cuda/cufft/index.html>.
*/
struct cufftWrapperNoBatching {
int N;
Expand Down
20 changes: 9 additions & 11 deletions src/fdm3d_cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ void cutoff_fourier_kernel(const double *in, double *out, const long long int N,
int b = blockDim.y * blockIdx.y + threadIdx.y;
int c = blockDim.z * blockIdx.z + threadIdx.z;

double scale_diff = N / M;
double scale_diff = static_cast<double>(N) / static_cast<double>(M);
double scaling = 1.0 / (scale_diff * scale_diff * scale_diff);

int half_M = M/2;
Expand Down Expand Up @@ -218,14 +218,12 @@ thrust::device_vector<double> compute_cutoff_fouriers(const long long int N, con
return cutoff_fft;
}

void compute_inverse_laplacian_test(const long long int N, const double L,
thrust::device_vector<double> &fft)
{
assert(N % 8 == 0);

dim3 threadsPerBlock(8, 4, 4);
dim3 numBlocks((int)N/8, (int)N/4, (int)N/4);
compute_inverse_laplacian_kernel<<<numBlocks, threadsPerBlock>>>(thrust::raw_pointer_cast(fft.data()), N, L);
}

// void compute_inverse_laplacian_test(const long long int N, const double L,
// thrust::device_vector<double> &fft)
// {
// assert(N % 8 == 0);

// dim3 threadsPerBlock(8, 4, 4);
// dim3 numBlocks((int)N/8, (int)N/4, (int)N/4);
// compute_inverse_laplacian_kernel<<<numBlocks, threadsPerBlock>>>(thrust::raw_pointer_cast(fft.data()), N, L);
// }
16 changes: 14 additions & 2 deletions src/fdm3d_cuda.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,36 @@

#include "fdm3d.hpp"

/*!
\brief CUDA version of identically named function in fdm3d.hpp.
*/
thrust::device_vector<double> compute_mode_power_spectrum(const long long int N, const double L, const double m, const double a_t,
thrust::device_vector<double> &state,
fftWrapperDispatcher<thrust::device_vector<double>>::Generic &fft_wrapper);

/*!
\brief CUDA version of identically named function in fdm3d.hpp.
*/
thrust::device_vector<double> compute_power_spectrum(const long long int N,
thrust::device_vector<double> &f,
fftWrapperDispatcher<thrust::device_vector<double>>::Generic &fft_wrapper);

thrust::device_vector<double> compute_laplacian(const long long int N, const double L,
thrust::device_vector<double> &f);

/*!
\brief CUDA version of identically named function in fdm3d.hpp.
*/
thrust::device_vector<double> compute_inverse_laplacian(const long long int N, const double L,
thrust::device_vector<double> &f,
fftWrapperDispatcher<thrust::device_vector<double>>::Generic &fft_wrapper);

/*!
\brief CUDA version of identically named function in fdm3d.hpp.
*/
thrust::device_vector<double> compute_cutoff_fouriers(const long long int N, const long long int M,
const thrust::device_vector<double> &fft);

void compute_inverse_laplacian_test(const long long int N, const double L,
thrust::device_vector<double> &fft);
// void compute_inverse_laplacian_test(const long long int N, const double L,
// thrust::device_vector<double> &fft);
#endif
38 changes: 36 additions & 2 deletions src/fftw_wrapper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,53 @@

/*!
\brief Wrapper for various FFTW functions for a \f$ N^3 \f$ grid.
See <https://www.fftw.org/fftw3_doc/Multi_002dDimensional-DFTs-of-Real-Data.html> for details.
*/
struct fftwWrapper {
int N;
fftw_plan plan_d2z;
fftw_plan plan_z2d;
fftw_plan plan_inplace_z2d;
explicit fftwWrapper(int N_);
explicit fftwWrapper(int N_); /*!< Constructor for grid size \f$ N \f$. */
~fftwWrapper();


/*!
\brief (Double floating point) Real to complex transform.
\param in A real vector of size \f$ N^3 \f$.
\return A real vector of size \f$ 2 N^2 (N/2+1) \f$ (or a complex vector of size \f$ N^2 (N/2+1) \f$), containing the discrete Fourier transform of input.
*/
Eigen::VectorXd execute_d2z(Eigen::VectorXd &in);

/*!
\brief (Double floating point) Real to complex transform.
\param in A real vector of size \f$ 2 N^3 \f$.
\return A real vector of size \f$ 4 N^2 (N/2+1) \f$ (or a complex vector of size \f$ 2 N^2 (N/2+1) \f$), containing discrete Fourier transforms of input. The first \f$ 2 N^2 (N/2+1) \f$ entries of the output are the DFT of the first \f$ N^3 \f$ entries of the input, and similar for the rest.
*/
Eigen::VectorXd execute_batched_d2z(Eigen::VectorXd &in);

/*!
\brief (Double floating point) Complex to real transform.
\param in A real vector of size \f$ 2 N^2 (N/2+1) \f$ (or a complex vector of size \f$ N^2 (N/2+1) \f$).
\return A real vector of size \f$ N^3 \f$, containing the inverse discrete Fourier transform of input.
\note This function destroys the information in the input `in`.
See FFTW's documentation <https://www.fftw.org/fftw3_doc/Planner-Flags.html>.
*/
Eigen::VectorXd execute_z2d(Eigen::VectorXd &in);

/*!
\brief No-return version of the complex to real transform.
\note This version is useful if you want to reuse the same memory location for the output; doing this can reduce unnecessary memory allocation / deallocation, saving lots of time.
Like the other version, this function destroys the data in input `in`.
*/
void execute_z2d(Eigen::VectorXd &in, Eigen::VectorXd &out);

/*!
\brief In-place version of the complex to real transform.
\param inout A real vector of size \f$ 2 N^2 (N/2+1) \f$ (or a complex vector of size \f$ N^2 (N/2+1) \f$). After the function call the data in `inout` is changed its inverse DFT. The vector still has size \f$ 2 N^2 (N/2+1) \f$, but only \f$ N^3 \f$ of the entries are meaningful. The entries are in FFTW padded format.
\note Make sure to access the elements inside with PADDED_IDX_OF (instead of IDX_OF).
See <https://www.fftw.org/fftw3_doc/Multi_002dDimensional-DFTs-of-Real-Data.html> for details of the padded format.
*/
void execute_inplace_z2d(Eigen::VectorXd &inout);

fftwWrapper(const fftwWrapper &) = delete;
Expand Down
5 changes: 5 additions & 0 deletions src/io.hpp
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
/*!
\file io.hpp
\author Siyang Ling
\brief Input/output utilities.
*/
#ifndef IO_HPP
#define IO_HPP
#include <cstdlib>
Expand Down
Loading

0 comments on commit 9db53fb

Please sign in to comment.