This part, contains project sample for Classification in Python using Sci-Kit tools.
In order to run the project please follow this link.
Data set for this section is different, please visit the dataset here: Clustering Dataset Web site
The goal is to group people by the comments they have made on social medias. This goal is achieved by using K-Means Clustering and Hierarchical-Clustering. And then, the result are compared together via Silhouette Score. As the results shows, 3 class worked best in both cases. However, the K-Means has higher score.
K-Means:
N-Cluster (K) | Silhouette Score |
---|---|
2 | 0.31 |
3 | 0.35 |
4 | 0.34 |
5 | 0.29 |
6 | 0.30 |
7 | 0.27 |
8 | 0.31 |
9 | 0.31 |
best k = 3 | 0.35 |
Hierarchical-Clustering:
N-Cluster | Silhouette Score |
---|---|
2 | 0.24 |
3 | 0.29 |
4 | 0.25 |
5 | 0.25 |
6 | 0.25 |
7 | 0.27 |
8 | 0.29 |
9 | 0.28 |
best k = 3 | 0.29 |
This is PCA plot of the clustered data for K-Means Clustering: This is PCA plot of the clustered data for Hierarchical Clustering: