Skip to content

DataScience-Golddiggers/Clusty-the-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clusty the Cluster: Patient Segmentation Analysis

yay

Python Status License

📋 Overview

Clusty the Cluster is a comprehensive Data Science project dedicated to Patient Segmentation. By leveraging unsupervised machine learning techniques, this project aims to identify distinct groups of patients based on their characteristics and medical history. These insights can be instrumental in personalizing healthcare plans, optimizing resource allocation, and improving patient outcomes.

The project follows a structured data science workflow, from Exploratory Data Analysis (EDA) to detailed Cluster Profiling.

🗂️ Project Structure

The analysis is divided into four sequential stages, each documented in a dedicated Jupyter Notebook:

Stage Notebook Description
1. Exploration 01_eda.ipynb Initial data inspection, distribution analysis, and correlation checks to understand the dataset's structure.
2. Preprocessing 02_preprocessing.ipynb Data cleaning, handling missing values, feature engineering, and normalization/standardization. A preprocessor.pkl is generated here.
3. Modeling 03_clustering.ipynb Application of clustering algorithms (e.g., K-Means, Hierarchical Clustering). Includes hyperparameter tuning and model selection.
4. Analysis 04_cluster_profiling.ipynb Interpreting the resulting clusters. Analyzing the centroids and distribution of features within each segment to derive actionable insights.

📊 Dataset

The project utilizes the patient_segmentation_dataset.csv located in the data/ directory.

  • Input: Raw patient data.
  • Output: clustered_data.csv containing the original data enriched with cluster correlations.

🛠️ Technology Stack

  • Language: Python
  • Data Manipulation: Pandas, NumPy
  • Machine Learning: Scikit-learn
  • Visualization: Matplotlib, Seaborn
  • Environment Management: Virtualenv

🚀 Getting Started

Prerequisites

Ensure you have Python installed. It is recommended to use a virtual environment.

Installation

  1. Clone the repository (if applicable) or navigate to the project directory.

  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

Launch the Jupyter Notebook server to interact with the analysis files:

jupyter notebook

It is recommended to run the notebooks in order (01 to 04) to replicate the full pipeline.

📈 Results & Insights

The analysis culminates in 04_cluster_profiling.ipynb, which provides a detailed breakdown of the identified patient profiles. These profiles help in understanding the "archetypes" present in the patient population.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is open-source and available under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors