Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
result-par		result-par
result		result
.gitignore		.gitignore
DNAParallel.py		DNAParallel.py
DNASequential.py		DNASequential.py
DNAgenerator.py		DNAgenerator.py
PointGenerator.py		PointGenerator.py
PointParallel.py		PointParallel.py
PointSequential.py		PointSequential.py
README.md		README.md
dataGenerator		dataGenerator
report.pdf		report.pdf
runParallel		runParallel
runSample		runSample
runSequential		runSequential

Repository files navigation

15-440-lab4
K-Means

Boya Yang (boyay) Shikun Zhang (shikunz)

Submission incluse:

pdf file:
- report.pdf
shell scripts:
- runSample runs a sample data of size 5000 with k=5 using both algorithms (use 8 processors for parallel)
First Step: generate Data
- dataGenerator generates datasets for both DNA and Points using a combination of different k's and p's, saves file into folder /data/ k ranges from 2 to 10; p is in [10, 100, 1000, 10000, 100000] also generate the solution file, the original k centroids
- runSequential runs sequential algorithms on two types of datasets generated by the dataGenerator with profiling information
- runParallel runs parallel algorithms on two types of datasets generated by the dataGenerator by using 2, 4, 8 or 12 processors with profiling information
python files (used in the shell scripts):

-k # of clusters -e value of epsilon -i input file -o output file -p number of points aruond centroid -l length of DNA strand, default = 50 -r range of coordinates, default = 1000000
- PointGenerator.py -k -p -o [-r ]
- PointSequential.py -k -e -i -o
- PointParallel.py -k -e -i -o
- DNAGenerator.py -k -p -o [-l ]
- DNASequential.py -k -e -i -o
- DNAParallel.py -k -e -i -o ======================================================================

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%