Submission incluse:
pdf file:
- report.pdf
shell scripts:
- runSample runs a sample data of size 5000 with k=5 using both algorithms (use 8 processors for parallel)
First Step: generate Data
dataGenerator generates datasets for both DNA and Points using a combination of different k's and p's, saves file into folder /data/ k ranges from 2 to 10; p is in [10, 100, 1000, 10000, 100000] also generate the solution file, the original k centroids
runSequential runs sequential algorithms on two types of datasets generated by the dataGenerator with profiling information
runParallel runs parallel algorithms on two types of datasets generated by the dataGenerator by using 2, 4, 8 or 12 processors with profiling information
python files (used in the shell scripts):
-k # of clusters -e value of epsilon -i input file -o output file -p number of points aruond centroid -l length of DNA strand, default = 50 -r range of coordinates, default = 1000000
PointGenerator.py -k -p -o [-r ]
PointSequential.py -k -e -i -o
PointParallel.py -k -e -i -o
DNAGenerator.py -k -p -o [-l ]
DNASequential.py -k -e -i -o
DNAParallel.py -k -e -i -o ======================================================================