Clustering

This part, contains project sample for Classification in Python using Sci-Kit tools.

Run Clustering

In order to run the project please follow this link.

Dataset

Data set for this section is different, please visit the dataset here: Clustering Dataset Web site

Results

The goal is to group people by the comments they have made on social medias. This goal is achieved by using K-Means Clustering and Hierarchical-Clustering. And then, the result are compared together via Silhouette Score. As the results shows, 3 class worked best in both cases. However, the K-Means has higher score.

K-Means:

N-Cluster (K)	Silhouette Score
2	0.31
3	0.35
4	0.34
5	0.29
6	0.30
7	0.27
8	0.31
9	0.31
best k = 3	0.35

Hierarchical-Clustering:

N-Cluster	Silhouette Score
2	0.24
3	0.29
4	0.25
5	0.25
6	0.25
7	0.27
8	0.29
9	0.28
best k = 3	0.29

This is PCA plot of the clustered data for K-Means Clustering: This is PCA plot of the clustered data for Hierarchical Clustering:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Clustering

Run Clustering

Dataset

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Clustering

Run Clustering

Dataset

Results