more documentation

hypermania · Aug 4, 2024 · 9db53fb · 9db53fb
1 parent 50cba1a
commit 9db53fb
Show file tree

Hide file tree

Showing 8 changed files with 234 additions and 43 deletions.
diff --git a/doc/writing_new_equation.md b/doc/writing_new_equation.md
@@ -1,4 +1,4 @@
-# Writing your own equation
+# Implementing your own equation
 
 Here we give an example of adding a field equation with \f$ \kappa \varphi^6 \f$ interaction to the codebase.
 \f[ \ddot{\varphi} - \nabla^2 \varphi + m^2 \varphi + \kappa \varphi^5 = 0 \f]
@@ -27,8 +27,8 @@ The equation class also has a reference to a `workspace`, so that it has access
 
 
 The most important function here is the `operator()`.
-When this function is called, it computes the time derivative of the state vector \f$ x \f$ at time \f$ t \f$, and stores it to \f$ dxdt \f$.
-Implementing this function is the minimal requirement for a class to work with odeint.
+When this function is called, it computes the time derivative of the state vector `x` at time `t`, and stores it to `dxdt`.
+Implementing this function is the minimal requirement for a class to work with the odeint library.
 To do this, we can simply copy the implementation for `KleinGordonEquation::operator()` and add a \f$ \kappa \varphi^5 \f$ term to it:
 ```{.cpp}
 void KappaEquation::operator()(const State &x, State &dxdt, const double t)
@@ -66,21 +66,149 @@ Note the extra line `kappa * pow(x(seqN(IDX_OF(N, a, b, 0), N)), 5)` giving the
 
 
 ## Adding the coupling parameter in workspace
-The code given above won't compile since `workspace.kappa` doesn't exist yet.
+The code given above won't compile yet since `workspace.kappa` doesn't exist.
+To make the code compile, add a new field in `WorkspaceGeneric`:
+```{.cpp}
+template<typename Vector>
+struct WorkspaceGeneric {
+	// Stuff
+	double kappa;
+	// Stuff
+};
+```
+As a general paradigm, we put data (e.g. coupling parameters, temporary variables) needed to solve the equation in a `Workspace`.
+Note that different equations use the same `Workspace`, and the field names (e.g. `kappa`) can mean different things for different equations.
+You are responsible for ensuring that your modification on `Workspace` doesn't introduce bugs for other equations.
+To avoid accidently introducing a bug, you are advised to add new fields for new parameters / temporary objects.
+
+
+## Setting `workspace.kappa` from a parameter struct
+Now suppose you define a new parameter class:
+```{.cpp}
+struct KappaParam {
+	// The usual
+	double kappa;
+};
+
+KappaParam param;
+```
+If you try calling the constructor `Workspace(param, initializer)`, `workspace.kappa` would not be automatically set to `param.kappa`.
+To resolve this, add the following in `workspace.hpp`:
+```{.cpp}
+template<typename Param>
+concept HasKappa = requires (Param param) { TYPE_REQUIREMENT(param.kappa, double) };
 
+// ...
 
-## Adding a new parameter
-You will need to define a new parameter struct that contains the new \f$ \kappa \f$ parameter.
+    WorkspaceGeneric(const Param &param, auto &initializer) :
+    N(param.N), L(param.L), fft_wrapper(param.N)
+  {
+    if constexpr(HasKappa<Param>) { kappa = param.kappa; }
+	// ...
+  }
+```
+This piece of code uses concept `HasKappa` to detect if `param.kappa` exists or not, and set `workspace.kappa` to `param.kappa` in the case it exists.
+Having `KappaParam` is useful since it works with the utilities in `param.h`.
 
 
 ## Add a function to compute energy density
 In order to save density spectrum, you would also want to implement a function to calculate energy density profile.
+Again we can imitate the implementation for `KleinGordonEquation`:
+```{.cpp}
+struct KappaEquation {
+	// ...
+	static Vector compute_energy_density(const Workspace &workspace, const double t);
+};
 
+KappaEquation::Vector KappaEquation::compute_energy_density(const Workspace &workspace, const double t)
+{
+  using namespace Eigen;
+  const long long int N = workspace.N;
+  const double L = workspace.L;
+  const double m = workspace.m;
+  const double kappa = workspace.kappa;
+  const double inv_h_sqr = 1.0 / ((L / N) * (L / N));
+  
+  VectorXd rho(workspace.state.size() / 2);
+    
+  for(long long int a = 0; a < N; ++a){
+    for(long long int b = 0; b < N; ++b){
+      rho(seqN(IDX_OF(N, a, b, 0), N)) = 0.5 *
+	( workspace.state(seqN(N*N*N+IDX_OF(N, a, b, 0), N)).cwiseAbs2()
+	  + m * m * workspace.state(seqN(IDX_OF(N, a, b, 0), N)).cwiseAbs2()
+	  + (1.0 / 6.0) * kappa * pow(workspace.state(seqN(IDX_OF(N, a, b, 0), N)), 6)
+	  + 0.25 * inv_h_sqr *
+	  ( (workspace.state(seqN(IDX_OF(N, (a+1)%N, b, 0), N))
+	     - workspace.state(seqN(IDX_OF(N, (a+N-1)%N, b, 0), N))).cwiseAbs2()
+	    + (workspace.state(seqN(IDX_OF(N, a, (b+1)%N, 0), N))
+	       - workspace.state(seqN(IDX_OF(N, a, (b+N-1)%N, 0), N))).cwiseAbs2() )
+	  );
+      rho(seqN(IDX_OF(N, a, b, 1), N-2)) += 0.5 * 0.25 * inv_h_sqr *
+	(workspace.state(seqN(IDX_OF(N, a, b, 2), N-2))
+	 - workspace.state(seqN(IDX_OF(N, a, b, 0), N-2))).cwiseAbs2();
+      rho(IDX_OF(N, a, b, 0)) += 0.5 * 0.25 * inv_h_sqr *
+	pow(workspace.state(IDX_OF(N, a, b, 1)) - workspace.state(IDX_OF(N, a, b, N-1)), 2);
+      rho(IDX_OF(N, a, b, N-1)) += 0.5 * 0.25 * inv_h_sqr *
+	pow(workspace.state(IDX_OF(N, a, b, 0)) - workspace.state(IDX_OF(N, a, b, N-2)), 2);
+    }
+  }
+  return rho;
+}
+```
+Note the extra \f$ \kappa \varphi^6 / 6\f$ term in the function above.
+Now `ConstIntervalObserver` knows how to compute the energy density for this theory.
 
 ## Using CUDA
-You want your equation to work on GPU memory, so the state vector would be:
+If you want your equation to run on GPU memory, then in `KappaEquation` the vector type should be set to:
 ```{.cpp}
 typedef thrust::device_vector<double> Vector;
 ```
-Your `operator()` should modify the state vector.
-To do that, the easiest way is to write your own CUDA kernel.
+Here, `thrust::device_vector<double>` is a class in the `thrust` library representing a double floating point array on the GPU.
+Much like `std::vector<double>`, the class `thrust::device_vector<double>` takes care of GPU memory allocation / deallocation in an RAII manner, so that you don't have to call CUDA memory management API directly.
+See [thrust documentation](https://nvidia.github.io/cccl/thrust/api/classthrust_1_1device__vector.html) for more details.
+
+Your `operator()` and `compute_energy_density` functions must now work on GPU device vectors.
+A straightforward way to do this is to write your own CUDA kernel.
+Here's an example on how to do it:
+```{.cpp}
+__global__
+void kappa_equation_kernel(const double *x, double *dxdt,
+	const double m, const double kappa,
+	const double inv_h_sqr,
+	const long long int N)
+{
+  int a = blockIdx.x;
+  int b = blockIdx.y;
+  int c = threadIdx.x;
+  
+  dxdt[IDX_OF(N, a, b, c)] = x[N*N*N+IDX_OF(N, a, b, c)];
+  dxdt[N*N*N+IDX_OF(N, a, b, c)] =
+    - m * m * x[IDX_OF(N, a, b, c)]
+	- kappa * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)] * x[IDX_OF(N, a, b, c)]
+    + inv_h_sqr * (-6.0 * x[IDX_OF(N, a, b, c)]
+		    + x[IDX_OF(N, (a+1)%N, b, c)]
+		    + x[IDX_OF(N, (a+N-1)%N, b, c)]
+		    + x[IDX_OF(N, a, (b+1)%N, c)]
+		    + x[IDX_OF(N, a, (b+N-1)%N, c)]
+		    + x[IDX_OF(N, a, b, (c+1)%N)]
+		    + x[IDX_OF(N, a, b, (c+N-1)%N)]);
+}
+
+void KappaEquation::operator()(const State &x, State &dxdt, const double t)
+{
+  const long long int N = workspace.N;
+  const double L = workspace.L;
+  const double m = workspace.m;
+  const double kappa = workspace.kappa;
+  const double inv_h_sqr = 1.0 / ((L / N) * (L / N));
+
+  dim3 threadsPerBlock((int)N, 1);
+  dim3 numBlocks((int)N, (int)N);
+  kappa_equation_kernel<<<numBlocks, threadsPerBlock>>>(thrust::raw_pointer_cast(x.data()), thrust::raw_pointer_cast(dxdt.data()), m, kappa, inv_h_sqr, N);
+}
+```
+Here `kappa_equation_kernel` is the CUDA kernel, and the `__global__` specifier means this function runs on the GPU.
+`KappaEquation::operator()` invokes the kernel via `kappa_equation_kernel<<<numBlocks, threadsPerBlock>>>`. 
+Given the execution configuration `threadsPerBlock` and `numBlocks`, the function `kappa_equation_kernel` is executed once for each `a,b,c`, with `0 <= a,b,c < N`.
+Depending on the kernel, different execution configurations could result in varying performance, or even introduce bugs.
+See the [CUDA C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels) for more details.
diff --git a/documentation.pdf b/documentation.pdf
diff --git a/src/cuda_wrapper.cuh b/src/cuda_wrapper.cuh
@@ -51,9 +51,7 @@ void copy_vector(Eigen::VectorXd &out, const thrust::device_vector<double> &in);
 
 void show_gpu_memory_usage(void);
 
-/*!
-  \brief Wrapper for 3D cufftPlan3d. Performs double to complex double FFT for a \f$ N^3 \f$ grid.
-*/
+// Wrapper for 3D cufftPlan3d. Performs double to complex double FFT for a \f$ N^3 \f$ grid.
 struct cufftWrapperD2Z {
   int N;
   cufftHandle plan;
@@ -68,10 +66,7 @@ struct cufftWrapperD2Z {
 };
 
 
-/*!
-  \brief Wrapper for 3D cufftPlanMany.
-  Performs two double to complex double FFT for a \f$ N^3 \f$ grid.
-*/
+// Wrapper for 3D cufftPlanMany. Performs two double to complex double FFT for a \f$ N^3 \f$ grid.
 struct cufftWrapperBatchedD2Z {
   int N;
   cufftHandle plan;
@@ -85,9 +80,10 @@ struct cufftWrapperBatchedD2Z {
   cufftWrapperBatchedD2Z &operator=(cufftWrapperBatchedD2Z &&) = delete;
 };
 
-/*
-  \brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid.
-  Different cufft plans share the same work area so that GPU memory usage is minimized.
+/*!
+  \brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid. Similar to fftwWrapper.
+
+  See <https://docs.nvidia.com/cuda/cufft/index.html>.
 */
 struct cufftWrapper {
   int N;
@@ -108,9 +104,11 @@ struct cufftWrapper {
   cufftWrapper &operator=(cufftWrapper &&) = delete;
 };
 
-/*
-  \brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid.
-  Different cufft plans share the same work area so that GPU memory usage is minimized.
+/*!
+  \brief Wrapper for various cufft functions for a \f$ N^3 \f$ grid. Similar to fftwWrapper.
+
+  Uses less GPU memory than cufftWrapper.
+  See <https://docs.nvidia.com/cuda/cufft/index.html>.
 */
 struct cufftWrapperNoBatching {
   int N;

diff --git a/src/fdm3d_cuda.cu b/src/fdm3d_cuda.cu
@@ -122,7 +122,7 @@ void cutoff_fourier_kernel(const double *in, double *out, const long long int N,
   int b = blockDim.y * blockIdx.y + threadIdx.y;
   int c = blockDim.z * blockIdx.z + threadIdx.z;
 
-  double scale_diff = N / M;
+  double scale_diff = static_cast<double>(N) / static_cast<double>(M);
   double scaling = 1.0 / (scale_diff * scale_diff * scale_diff);
 
   int half_M = M/2;
@@ -218,14 +218,12 @@ thrust::device_vector<double> compute_cutoff_fouriers(const long long int N, con
   return cutoff_fft;
 }
 
-void compute_inverse_laplacian_test(const long long int N, const double L,
-				    thrust::device_vector<double> &fft)
-{
-  assert(N % 8 == 0);
-
-  dim3 threadsPerBlock(8, 4, 4);
-  dim3 numBlocks((int)N/8, (int)N/4, (int)N/4);
-  compute_inverse_laplacian_kernel<<<numBlocks, threadsPerBlock>>>(thrust::raw_pointer_cast(fft.data()), N, L);
-}
-
+// void compute_inverse_laplacian_test(const long long int N, const double L,
+// 				    thrust::device_vector<double> &fft)
+// {
+//   assert(N % 8 == 0);
 
+//   dim3 threadsPerBlock(8, 4, 4);
+//   dim3 numBlocks((int)N/8, (int)N/4, (int)N/4);
+//   compute_inverse_laplacian_kernel<<<numBlocks, threadsPerBlock>>>(thrust::raw_pointer_cast(fft.data()), N, L);
+// }
diff --git a/src/fdm3d_cuda.cuh b/src/fdm3d_cuda.cuh
@@ -16,24 +16,36 @@
 
 #include "fdm3d.hpp"
 
+/*!
+  \brief CUDA version of identically named function in fdm3d.hpp.
+*/
 thrust::device_vector<double> compute_mode_power_spectrum(const long long int N, const double L, const double m, const double a_t,
 							  thrust::device_vector<double> &state,
 							  fftWrapperDispatcher<thrust::device_vector<double>>::Generic &fft_wrapper);
 
+/*!
+  \brief CUDA version of identically named function in fdm3d.hpp.
+*/
 thrust::device_vector<double> compute_power_spectrum(const long long int N,
 						     thrust::device_vector<double> &f,
 						     fftWrapperDispatcher<thrust::device_vector<double>>::Generic &fft_wrapper);
 
 thrust::device_vector<double> compute_laplacian(const long long int N, const double L,
 						thrust::device_vector<double> &f);
 
+/*!
+  \brief CUDA version of identically named function in fdm3d.hpp.
+*/
 thrust::device_vector<double> compute_inverse_laplacian(const long long int N, const double L,
 							thrust::device_vector<double> &f,
 							fftWrapperDispatcher<thrust::device_vector<double>>::Generic &fft_wrapper);
 
+/*!
+  \brief CUDA version of identically named function in fdm3d.hpp.
+*/
 thrust::device_vector<double> compute_cutoff_fouriers(const long long int N, const long long int M,
 						      const thrust::device_vector<double> &fft);
 
-void compute_inverse_laplacian_test(const long long int N, const double L,
-				    thrust::device_vector<double> &fft);
+// void compute_inverse_laplacian_test(const long long int N, const double L,
+// 				    thrust::device_vector<double> &fft);
 #endif
diff --git a/src/fftw_wrapper.hpp b/src/fftw_wrapper.hpp
@@ -14,19 +14,53 @@
 
 /*!
   \brief Wrapper for various FFTW functions for a \f$ N^3 \f$ grid.
+
+  See <https://www.fftw.org/fftw3_doc/Multi_002dDimensional-DFTs-of-Real-Data.html> for details.
 */
 struct fftwWrapper {
   int N;
   fftw_plan plan_d2z;
   fftw_plan plan_z2d;
   fftw_plan plan_inplace_z2d;
-  explicit fftwWrapper(int N_);
+  explicit fftwWrapper(int N_); /*!< Constructor for grid size \f$ N \f$. */
   ~fftwWrapper();
-
+
+  /*!
+    \brief (Double floating point) Real to complex transform.
+    \param in A real vector of size \f$ N^3 \f$.
+    \return A real vector of size \f$ 2 N^2 (N/2+1) \f$ (or a complex vector of size \f$ N^2 (N/2+1) \f$), containing the discrete Fourier transform of input.
+  */
   Eigen::VectorXd execute_d2z(Eigen::VectorXd &in);
+
+  /*!
+    \brief (Double floating point) Real to complex transform.
+    \param in A real vector of size \f$ 2 N^3 \f$.
+    \return A real vector of size \f$ 4 N^2 (N/2+1) \f$ (or a complex vector of size \f$ 2 N^2 (N/2+1) \f$), containing discrete Fourier transforms of input. The first \f$ 2 N^2 (N/2+1) \f$ entries of the output are the DFT of the first \f$ N^3 \f$ entries of the input, and similar for the rest.
+  */
   Eigen::VectorXd execute_batched_d2z(Eigen::VectorXd &in);
+
+  /*!
+    \brief (Double floating point) Complex to real transform.
+    \param in A real vector of size \f$ 2 N^2 (N/2+1) \f$ (or a complex vector of size \f$ N^2 (N/2+1) \f$).
+    \return A real vector of size \f$ N^3 \f$, containing the inverse discrete Fourier transform of input.
+    \note This function destroys the information in the input `in`.
+    See FFTW's documentation <https://www.fftw.org/fftw3_doc/Planner-Flags.html>.
+  */
   Eigen::VectorXd execute_z2d(Eigen::VectorXd &in);
+
+  /*! 
+    \brief No-return version of the complex to real transform.
+    \note This version is useful if you want to reuse the same memory location for the output; doing this can reduce unnecessary memory allocation / deallocation, saving lots of time.
+    Like the other version, this function destroys the data in input `in`.
+  */
   void execute_z2d(Eigen::VectorXd &in, Eigen::VectorXd &out);
+
+  /*!
+    \brief In-place version of the complex to real transform.
+    \param inout A real vector of size \f$ 2 N^2 (N/2+1) \f$ (or a complex vector of size \f$ N^2 (N/2+1) \f$). After the function call the data in `inout` is changed its inverse DFT. The vector still has size \f$ 2 N^2 (N/2+1) \f$, but only \f$ N^3 \f$ of the entries are meaningful. The entries are in FFTW padded format.
+    \note Make sure to access the elements inside with PADDED_IDX_OF (instead of IDX_OF).
+    See <https://www.fftw.org/fftw3_doc/Multi_002dDimensional-DFTs-of-Real-Data.html> for details of the padded format.
+  */
   void execute_inplace_z2d(Eigen::VectorXd &inout);
 
   fftwWrapper(const fftwWrapper &) = delete;

diff --git a/src/io.hpp b/src/io.hpp
@@ -1,3 +1,8 @@
+/*! 
+  \file io.hpp
+  \author Siyang Ling
+  \brief Input/output utilities.
+*/
 #ifndef IO_HPP
 #define IO_HPP
 #include <cstdlib>