Feature: Plot Silhouette Scores Across Different Numbers of Clusters

## 📊 Feature Request: `compare_silhouette_scores` Method for Cluster Number Evaluation

### Summary

Add a new method `compare_silhouette_scores` to the `unsupervised` module of the `DataScienceUtils` package. This method will help users determine the optimal number of clusters by running clustering over a specified range of cluster counts and plotting the corresponding average silhouette scores.

---

### Motivation

Selecting the right number of clusters (k) is a critical step in clustering analysis. The silhouette score is a popular metric for assessing cluster cohesion and separation. Automating the process of:

- Running clustering algorithms across multiple k values,
- Computing silhouette scores for each k,
- Visualizing the silhouette score trend,

greatly simplifies hyperparameter tuning and improves the data scientist's workflow.

---

### API Proposal and Full Code

```python
import numpy as np
import matplotlib.pyplot as plt
from typing import Union, Tuple, Any
from sklearn.metrics import silhouette_score

def compare_silhouette_scores(
    X: Union[np.ndarray, 'pd.DataFrame'],
    k_range: range = range(2, 11),
    algorithm: Any = None,
    algorithm_params: dict = None,
    figsize: Tuple[int, int] = (10, 6)
) -> plt.Figure:
    """
    Compare silhouette scores across different numbers of clusters.

    Parameters:
        X (array-like or pd.DataFrame): Input data for clustering.
        k_range (range): Range of cluster numbers to evaluate (default: 2 to 10).
        algorithm (class): Clustering algorithm class (must implement fit_predict). Defaults to KMeans.
        algorithm_params (dict): Parameters for clustering algorithm initialization.
        figsize (Tuple[int, int]): Size of the matplotlib figure.

    Returns:
        plt.Figure: Matplotlib Figure object showing silhouette scores vs number of clusters.
    """

    from sklearn.cluster import KMeans

    if algorithm is None:
        algorithm = KMeans

    if algorithm_params is None:
        algorithm_params = {'random_state': 42, 'n_init': 10}

    silhouette_scores = []
    k_values = list(k_range)

    for k in k_values:
        clusterer = algorithm(n_clusters=k, **algorithm_params)
        cluster_labels = clusterer.fit_predict(X)
        score = silhouette_score(X, cluster_labels)
        silhouette_scores.append(score)

    fig, ax = plt.subplots(figsize=figsize)
    ax.plot(k_values, silhouette_scores, 'bo-', linewidth=2, markersize=8)
    ax.set_xlabel('Number of Clusters (k)', fontsize=12)
    ax.set_ylabel('Average Silhouette Score', fontsize=12)
    ax.set_title('Silhouette Score vs Number of Clusters', fontsize=14)
    ax.grid(True, alpha=0.3)

    best_k = k_values[np.argmax(silhouette_scores)]
    best_score = max(silhouette_scores)
    ax.axvline(x=best_k, color='red', linestyle='--', alpha=0.7)
    ax.text(best_k, best_score + 0.02, f'Best: k={best_k}\nScore={best_score:.3f}',
            ha='center', va='bottom', fontsize=10,
            bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow", alpha=0.7))

    for k, score in zip(k_values, silhouette_scores):
        ax.text(k, score - 0.02, f'{score:.3f}', ha='center', va='top', fontsize=9)

    plt.tight_layout()
    return fig
```
## To Do
- [ ] Implement compare_silhouette_scores as described above in unsupervised.py.
- [ ] Write unit tests to verify:
       * Correct calculation of silhouette scores across k values.
       * Handling of default and custom clustering algorithms.
- [ ] Update documentation with usage examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Plot Silhouette Scores Across Different Numbers of Clusters #67

📊 Feature Request: `compare_silhouette_scores` Method for Cluster Number Evaluation

Summary

Motivation

API Proposal and Full Code

To Do

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Plot Silhouette Scores Across Different Numbers of Clusters #67

Description

📊 Feature Request: compare_silhouette_scores Method for Cluster Number Evaluation

Summary

Motivation

API Proposal and Full Code

To Do

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

📊 Feature Request: `compare_silhouette_scores` Method for Cluster Number Evaluation