This code numerically approximates solutions to the 1D heat equation. It does so using the Forwards-Time Centered-Space (FTCS) method. I chose to use the FTCS method, since it is so conducive to the fine grain parallelism, or data level parallelism, that CUDA excels at.
This is because the FTCS method results in a recurrence relation which gives the temperature at each point in terms of temperatures at the previous time step. Therefore each CUDA thread can be assigned a specific point, and at each time step the threads can simultaneously calculate the temperatures at their points.
The simulation really consists of two parts. The first part is a CUDA program which actually numerically approximates the solution. This code, which is in heatEquation.hpp and main.cu, was written to run on UW-Madison's Euler cluster. The code which performs the numerical approximation runs on a NVIDIA GPU. After this code completes, the results are written to a file.
The second part is a Python script, animate.py, which is used to visualize the solution. The script uses matplotlib, and reads the data generated by the CUDA program to create an animation of the heat diffusing over time. The script was developed to be run on someone's local Linux or Mac machine using Python 2.7.
The last component of this project is the Python script plotTimmings.py. It uses matplotlib to graph the run time of the simulation versus the number of time steps for which the simulation performed calculations.
To make building the CUDA program easier, this project includes a Makefile written to be run on Euler. It also includes a SLURM job submission script, slurmscript, which was used to run the CUDA simulation on Euler.
The code which actually carry out the numerical approximation are in heatEquation.hpp. In order to numerically approximate the solution to a specific problem, the user of this code calls the function solveProblemInstance.
solveProblemInstance takes two structs and a file name. The first struct, of type HeatProblem1d, specifies the instance of the heat equation to be solved. That is it gives the length of the rod, the thermal diffusivity of the rod (referred to as alpha), the boundary conditions, and a functor to calculate the initial temperature at any point on the rod. The functor was inspired by Thrust, and the struct type is templated to allow for any functor (though it's operator() must take a float and return a float).
The second struct is of type SimulationParams1D, and tells the simulation functions how the numerical approximation should to be carried out. It contains the difference between points in time, the difference between points in space, and the number of points in time for which the simulation will perform calculations, excluding time t = 0. It also includes the number of time steps between lines of the output file.
To give an explicit example of how the time based simulation parameters interact, let's say that difference between points in time is 2, the number of points in time for which the simulation will perform calculations (excluding time = 0) is 86, and periodOfRecordings is 3. The program will perform calculations at times time = 0, time = 2, time = 4, ..., time = 172. The program would write temperatures to the output file for time = 0, time = 6, time = 12, ..., time = 168 (since the number of time intervals is not evenly divisible by 3, the temperatures at the last time step aren't written to the file).
I chose to pass this information in as structs, because I felt with so many parameters (9 to be exact) setting struct members was much more readable then long parameter lists. It also made it easy for me to pass the information on to helper functions.
I chose to have two structs since I thought one might want to solve the same problem with multiple different simulation parameters (e.g. time step sizes), and decoupling the problem from the simulation parameters would better facilitate this.
main.cu currently does two things. It numerically approximates a problem which I found on page 143 of "The Mathematics of Diffusion" by J. Crank. Crank presents the numerical solution to this problem, as was obtained using the FTCS method, in table 8.1 on page 389 (though the book is from 1975 and does not include code). I used this problem as a quick "sanity check", since I could compare my results to those presented by Crank.
main.cu also currently runs some timing code. It finds a solution to the same problem, with an increasingly great number of time steps. It outputs the inclusive time taken each time, as well as the number of iterations, to stdout. When main.cu is run using slurmscript, this output ends up in a file called sbatch.out, which can then be used to generate the timing plot using plotTimmings.py.
In the following discussion I refer to UINT_MAX which is defined in , and is the largest value which can represented by an unsigned int.
- The number of iterations must be strictly less than UINT_MAX.
- The ceiling of the length of the rod divided by delta X must be strictly less than 1024.
- The number of temperatures produced in the output of a simulation must be less than or equal to UINT_MAX + 1.
- r, or (deltaT * alpha) / (deltaX^2), must be less than or equal to 1/2.
In the function solveProblemInstanceDevice, I keep track of the temperatures at the previous time step in an array in global memory. Clearly this results in way too many trips to global memory, and the array should be in shared memory. I didn't use shared memory in the initial development to simplify things, and the course (as well as my access to Euler) ended before I got around to fixing it.
Also, it would be good to eliminate the constraint on the number of discrete points used in the simulation. This could be done by having some threads be assigned more than one point if the number of points is greater than 1024. Again, the course and my access to Euler ended before I got around to eliminating that restriction.
- Clone this repository on Euler.
- Run
make
in the top level directory of this project (that is to say the the directory that this README is in). - Run
mkdir simData
- The generated executable is named
generate_data
. This is the CUDA program which will be run on Euler to generate the data. - Run
./generate_data
either via a slurm script or in interactive mode - The generated data is now in a folder called
simData
- In order to copy this data to your local machine you will have to compress it.
- Run
zip -r simData simData
- After the compression completes, there will be a file called
simData.zip
- Copy
simData.zip
andanimate.py
to your local machine. I trust you already have some way you usually copy files back and forth.
IMPORTANT: Even if you use sshfs or sftpnetdrive, you will still have to manually
move the compressed file over to your local machine. You can drag and drop it if
you'd like, you can use scp
if you'd like, but simData.zip
needs to be on your
local machine's hard drive.
Unfortunately the animation requires matplotlib, which can't be used on Euler at the moment. You will need to install matplotlib if you don't already have it.
The best way to install matplotlib is to install the Anaconda package manager.
- Go to https://www.anaconda.com/download/
- Download the Python 2.7 version of Anaconda. This comes with everything you need.
cd
into the directory withsimData.zip
andanimate.py
- Unzip
simData.zip
animate.py
takes a file name as its first argument. For its second argument you can optionally give it the time in milliseconds that the animation should pause between frames (the default value is 200). The minimum value for the second argument is 1.- For each file
aFile
in the unzipedsimData
runpython animate.py aFile
. If you want to speed things up or slow them down you can runpython animate.py aFile timeStep
.