K-means Clustering

Parallel GPU implementations of K-means clustering using CUDA and Thrust.

Build

make clean && make

Run

Example commands for different datasets:

2k dataset (n=2048, d=16, k=16):

./bin/kmeans -v 1 -k 16 -d 16 -i input/random-n2048-d16-c16.txt -m 150 -t 1e-5 -s 8675309

16k dataset (n=16384, d=24, k=16):

./bin/kmeans -v 1 -k 16 -d 24 -i input/random-n16384-d24-c16.txt -m 150 -t 1e-5 -s 8675309

64k dataset (n=65536, d=32, k=16):

./bin/kmeans -v 1 -k 16 -d 32 -i input/random-n65536-d32-c16.txt -m 150 -t 1e-5 -s 8675309

Arguments:

-v: version (0=CPU, 1=CUDA Basic, 2=CUDA Shared, 3=Thrust)
-k: number of clusters
-d: dimension of points
-i: input file
-m: max iterations
-t: convergence threshold
-s: random seed
-c: output centroids (optional, default: output assignments)

Testing and Analysis

Run correctness tests:

./run_correctness_test.sh [version]
# Example: ./run_correctness_test.sh 1  # Test CUDA Basic
# Or test all: ./run_correctness_test.sh 0 1 2 3

Run performance tests:

./run_performance_test.sh

Generate all graphs:

./generate_all_graphs.sh

Or generate individual graphs:

python3 graph_speedup.py
python3 graph_efficiency.py
python3 graph_data_transfer.py

Generate report PDF:

./generate_pdf.sh

Results

Performance Summary (64k dataset):

CUDA Basic: 13.4x speedup, 63.8% memory-efficient, 0.29% compute-efficient
CUDA Shared: 13.1x speedup, 62.5% memory-efficient, 0.28% compute-efficient
Thrust: 5.7x speedup, 27.2% memory-efficient, 0.12% compute-efficient

Key Findings:

Workload is memory-bound (not compute-bound)
CUDA Basic outperforms Shared Memory (no benefit from caching small centroids)
Data transfer overhead is negligible (<0.04% of runtime)

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
answers		answers
bin		bin
input		input
output		output
Makefile		Makefile
Observation.md		Observation.md
README.md		README.md
Report.md		Report.md
Report.pdf		Report.pdf
compare_centroids.py		compare_centroids.py
generate_all_graphs.sh		generate_all_graphs.sh
generate_pdf.sh		generate_pdf.sh
graph_data_transfer.py		graph_data_transfer.py
graph_efficiency.py		graph_efficiency.py
graph_speedup.py		graph_speedup.py
io.cpp		io.cpp
io.h		io.h
kmeans.cpp		kmeans.cpp
kmeans.h		kmeans.h
kmeans_cpu.cpp		kmeans_cpu.cpp
kmeans_kernel.cu		kmeans_kernel.cu
kmeans_shared.cu		kmeans_shared.cu
kmeans_thrust.cu		kmeans_thrust.cu
run_correctness_test.sh		run_correctness_test.sh
run_performance_test.sh		run_performance_test.sh
submit		submit
submit_guide.txt		submit_guide.txt
timer.h		timer.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

K-means Clustering

Build

Run

Testing and Analysis

Results

About

Uh oh!

Releases

Packages

Languages

lebinary/parallel-kmeans-cuda

Folders and files

Latest commit

History

Repository files navigation

K-means Clustering

Build

Run

Testing and Analysis

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages