In this project, an implementation of PDDP algorithm was developed.
The algorithm takes as input a matrix consisting of vectors representing the elements of the data set. The power iteration method is used in order to calculate the output and the final result is a clustering of the input data. The algorithm stops when it has converged.
The implementation in CUDA utilizes shared memory, coalesced accesses in memory as well as atomic operations.