Skip to content

Commit e2ba9b5

Browse files
author
Brett VanderHaar
committed
design document
1 parent 0167205 commit e2ba9b5

File tree

3 files changed

+23
-1
lines changed

3 files changed

+23
-1
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
11
# floyds-algorithm
22

3-
start
3+
## Abstract
4+
An OpenMP and non-OpenMP version of a C++ implementation of Floyd's algorithm was tested and compared. Results indicate dramatic speedup using OpenMP.
5+
6+
## High Level Design
7+
The single threaded version goes row-by-row through the matrix and calculates the score. The multi-threaded version using parallel OpenMP for loops.
8+
9+
## Implementation
10+
C++14 standard library with GCC 5.2 was chosen due to ease of implementing OpenMP. The OMP_NUM_THREADS environment variable was used to control the number of OpenMP threads. Level 2 optimizations were also compared in the tests due to anecdotal evidence resulting in reduced processing time.
11+
12+
## Testing Methodology
13+
A Macbook Pro, Early 2015 with a 2.7GHz Core i5 processor with 8GB of DDR3 RAM was used for compilation and test runs. A Linux VM with Arch Linux and GCC 5.2 were used.
14+
15+
## Discussion
16+
Here is the raw data of the test runs.
17+
![Raw Data](https://raw.githubusercontent.com/bvanderhaar/floyds-algorithm/master/docs/raw-data.png)
18+
19+
Here is the comparison of the data, graphed.
20+
![Comparison](https://raw.githubusercontent.com/bvanderhaar/floyds-algorithm/master/docs/runtime-graph.png)
21+
22+
Both L2 optimizations and OpenMP have profound effect on processing time.
23+
24+
## Conclusion
25+
OpenMP reduced the processing time in all cases. L2 optimizations are also tuned to help in the case of nested for loops.

docs/raw-data.png

76.2 KB
Loading

docs/runtime-graph.png

11.3 KB
Loading

0 commit comments

Comments
 (0)