Explain the implementations of Floyd-Warshall and Dijkstra algorithms in their sequential and parallel versions using CUDA to solve the problem of calculating the distances of all shortest paths in a given input graph represented by its adjacency matrix.
Note
The parallel programs were executed and compiled in a Windows environment using CUDA Toolkit 12.3 utilizing the dedicated NVIDIA GeForce MX250 graphics card for executing the algorithms on the GPU.
To solve the All-Pairs Shortest Path (APSP) problem using Dijkstra's algorithm, one can execute
Important
In the CUDA code, the source is partitioned in such a way that each thread in the kernel is responsible for processing a specific vertex of the graph, thus calculating the shortest path from this vertex to all others. In other words, it executes Dijkstra's algorithm in its variation for SSSP.
Unlike Dijkstra's algorithm, Floyd-Warshall naturally computes the All-Pairs Shortest Path (APSP) without requiring adaptation. In its parallel version, a mapping strategy using 2D blocks was adopted, where each thread in the GPU is responsible for calculating the shortest distance between pairs of vertices
The block and grid sizes in both CUDA algorithms were configured according to the problem dimensions and the size of the matrices to effectively distribute the work among threads. In the Dijkstra algorithm, for example, utilizing the partitioned source strategy allows for an exact number of blocks equivalent to the number of vertices for parallelization. This approach helps optimize the workload distribution among the threads.
The figures
Caution
It's important to note that the graph does not display the average execution time of multiple results for each matrix size. The values in the plot represent the outcome of a single, entirely random test concerning the graph generation for each quantity of vertices. This justifies, for instance, the varying slopes or inclinations of the line in the Floyd graph.
The project files were divided into folders named after their respective algorithm implementations. Both folders contain the sequential and parallel solutions, as well as a combined solution that tests both the sequential and parallel approaches for the same adjacency matrix. The results are saved in a log file, which is utilized by a Python program in a .ipynb file to generate the graph based on the written information.
Tip
The project includes a file named clean_exe.py that can be executed to clean the executables and libraries generated by nvcc and gcc. The files with the extensions .c and .cuda can be compiled using the following:
DIJKSTRA
'DIJKSTRA SERIAL' $ gcc -o dijkstra_serial .\dijkstra_serial.c -std=c99 -lm -Wall -Wextra -g3
'DIJKSTRA PARALLEL' $ nvcc -o dijkstra_parallel .\dijkstra_parallel.cu
'DIJKSTRA COMPARISON' $ nvcc -o dijkstra_comparison .\dijkstra_comparison.cu
FLOYD-WARSHALL
'FLOYD SERIAL' $ gcc -o floyd_serial .\floyd_serial.c -std=c99 -lm -Wall -Wextra -g3
'FLOYD PARALLEL' $ nvcc -o floyd_parallel .\floyd_parallel.cu
'FLOYD COMPARISON' $ nvcc -o floyd_comparison .\floyd_comparison.cu
- Giovane Hashinokuti Iwamoto - Computer Science student at UFMS - Brazil - MS
I am always open to receiving constructive criticism and suggestions for improvement in my developed code. I believe that feedback is an essential part of the learning and growth process, and I am eager to learn from others and make my code the best it can be. Whether it's a minor tweak or a major overhaul, I am willing to consider all suggestions and implement the changes that will benefit my code and its users.