CUDA SHORTEST PATHS

OVERVIEW

Explain the implementations of Floyd-Warshall and Dijkstra algorithms in their sequential and parallel versions using CUDA to solve the problem of calculating the distances of all shortest paths in a given input graph represented by its adjacency matrix.

Note

The parallel programs were executed and compiled in a Windows environment using CUDA Toolkit 12.3 utilizing the dedicated NVIDIA GeForce MX250 graphics card for executing the algorithms on the GPU.

DIJKSTRA ALGORITHM

To solve the All-Pairs Shortest Path (APSP) problem using Dijkstra's algorithm, one can execute $n$ instances of the Single-Source Shortest Path (SSSP) problem for each of the $n$ source vertices. Both in the sequential and parallel versions, this adaptation of the algorithm mentioned earlier is used to solve the APSP. However, the main difference in executions and performance lies in the utilization of a parallelization strategy adopted by the parallel version.

Important

In the CUDA code, the source is partitioned in such a way that each thread in the kernel is responsible for processing a specific vertex of the graph, thus calculating the shortest path from this vertex to all others. In other words, it executes Dijkstra's algorithm in its variation for SSSP.

FLOYD-WARSHALL ALGORITHM

Unlike Dijkstra's algorithm, Floyd-Warshall naturally computes the All-Pairs Shortest Path (APSP) without requiring adaptation. In its parallel version, a mapping strategy using 2D blocks was adopted, where each thread in the GPU is responsible for calculating the shortest distance between pairs of vertices $(i, j)$ for a given value of $k$. The kernel is launched in a 2D grid of threads, and after each iteration of the intermediate loop, the code synchronizes the GPU execution, ensuring that all kernel iterations have been completed.

BLOCKS AND GRIDS

The block and grid sizes in both CUDA algorithms were configured according to the problem dimensions and the size of the matrices to effectively distribute the work among threads. In the Dijkstra algorithm, for example, utilizing the partitioned source strategy allows for an exact number of blocks equivalent to the number of vertices for parallelization. This approach helps optimize the workload distribution among the threads.

COMPARISON

The figures $1$ and $2$ depict comparisons between the implementations of the algorithms for the sequential formulation executed on the CPU and the parallel formulation using CUDA on the GPU. For each algorithm, a different adjacency matrix was generated with $v$ vertices, and this same matrix was used in both the sequential and parallel formulations to calculate the execution time, ensuring a more suitable and precise comparison between different solutions for the same dimension.

Caution

It's important to note that the graph does not display the average execution time of multiple results for each matrix size. The values in the plot represent the outcome of a single, entirely random test concerning the graph generation for each quantity of vertices. This justifies, for instance, the varying slopes or inclinations of the line in the Floyd graph.

FILES AND COMPILE

The project files were divided into folders named after their respective algorithm implementations. Both folders contain the sequential and parallel solutions, as well as a combined solution that tests both the sequential and parallel approaches for the same adjacency matrix. The results are saved in a log file, which is utilized by a Python program in a .ipynb file to generate the graph based on the written information.

Tip

The project includes a file named clean_exe.py that can be executed to clean the executables and libraries generated by nvcc and gcc. The files with the extensions .c and .cuda can be compiled using the following:

DIJKSTRA

'DIJKSTRA SERIAL' $ gcc -o dijkstra_serial .\dijkstra_serial.c -std=c99 -lm -Wall -Wextra -g3
'DIJKSTRA PARALLEL' $ nvcc -o dijkstra_parallel .\dijkstra_parallel.cu
'DIJKSTRA COMPARISON' $ nvcc -o dijkstra_comparison .\dijkstra_comparison.cu

FLOYD-WARSHALL

'FLOYD SERIAL' $ gcc -o floyd_serial .\floyd_serial.c -std=c99 -lm -Wall -Wextra -g3
'FLOYD PARALLEL' $ nvcc -o floyd_parallel .\floyd_parallel.cu
'FLOYD COMPARISON' $ nvcc -o floyd_comparison .\floyd_comparison.cu

AUTHOR

Giovane Hashinokuti Iwamoto - Computer Science student at UFMS - Brazil - MS

I am always open to receiving constructive criticism and suggestions for improvement in my developed code. I believe that feedback is an essential part of the learning and growth process, and I am eager to learn from others and make my code the best it can be. Whether it's a minor tweak or a major overhaul, I am willing to consider all suggestions and implement the changes that will benefit my code and its users.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
apsp_dijkstra		apsp_dijkstra
apsp_floyd		apsp_floyd
images		images
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
clean_exe.py		clean_exe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA SHORTEST PATHS

OVERVIEW

DIJKSTRA ALGORITHM

FLOYD-WARSHALL ALGORITHM

BLOCKS AND GRIDS

COMPARISON

FILES AND COMPILE

AUTHOR

About

Releases

Packages

Languages

License

GiovaneIwamoto/cuda-shortest-paths

Folders and files

Latest commit

History

Repository files navigation

CUDA SHORTEST PATHS

OVERVIEW

DIJKSTRA ALGORITHM

FLOYD-WARSHALL ALGORITHM

BLOCKS AND GRIDS

COMPARISON

FILES AND COMPILE

AUTHOR

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages