Clusty the Cluster: Patient Segmentation Analysis

📋 Overview

Clusty the Cluster is a comprehensive Data Science project dedicated to Patient Segmentation. By leveraging unsupervised machine learning techniques, this project aims to identify distinct groups of patients based on their characteristics and medical history. These insights can be instrumental in personalizing healthcare plans, optimizing resource allocation, and improving patient outcomes.

The project follows a structured data science workflow, from Exploratory Data Analysis (EDA) to detailed Cluster Profiling.

🗂️ Project Structure

The analysis is divided into four sequential stages, each documented in a dedicated Jupyter Notebook:

Stage	Notebook	Description
1. Exploration	`01_eda.ipynb`	Initial data inspection, distribution analysis, and correlation checks to understand the dataset's structure.
2. Preprocessing	`02_preprocessing.ipynb`	Data cleaning, handling missing values, feature engineering, and normalization/standardization. A `preprocessor.pkl` is generated here.
3. Modeling	`03_clustering.ipynb`	Application of clustering algorithms (e.g., K-Means, Hierarchical Clustering). Includes hyperparameter tuning and model selection.
4. Analysis	`04_cluster_profiling.ipynb`	Interpreting the resulting clusters. Analyzing the centroids and distribution of features within each segment to derive actionable insights.

📊 Dataset

The project utilizes the patient_segmentation_dataset.csv located in the data/ directory.

Input: Raw patient data.
Output: clustered_data.csv containing the original data enriched with cluster correlations.

🛠️ Technology Stack

Language: Python
Data Manipulation: Pandas, NumPy
Machine Learning: Scikit-learn
Visualization: Matplotlib, Seaborn
Environment Management: Virtualenv

🚀 Getting Started

Prerequisites

Ensure you have Python installed. It is recommended to use a virtual environment.

Installation

Clone the repository (if applicable) or navigate to the project directory.

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Launch the Jupyter Notebook server to interact with the analysis files:

jupyter notebook

It is recommended to run the notebooks in order (01 to 04) to replicate the full pipeline.

📈 Results & Insights

The analysis culminates in 04_cluster_profiling.ipynb, which provides a detailed breakdown of the identified patient profiles. These profiles help in understanding the "archetypes" present in the patient population.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is open-source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
data		data
docs		docs
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
Clustering.pdf		Clustering.pdf
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clusty the Cluster: Patient Segmentation Analysis

📋 Overview

🗂️ Project Structure

📊 Dataset

🛠️ Technology Stack

🚀 Getting Started

Prerequisites

Installation

Usage

📈 Results & Insights

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clusty the Cluster: Patient Segmentation Analysis

📋 Overview

🗂️ Project Structure

📊 Dataset

🛠️ Technology Stack

🚀 Getting Started

Prerequisites

Installation

Usage

📈 Results & Insights

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages