This project compares various clustering techniques to analyze the provided dataset. The techniques used include K-Means and DBSCAN. The goal is to evaluate and compare how each model clusters the data based on different metrics. Utilize some hyperparameter tuning tools too such as RandomSearchCV, Optuna, & Hyperopt. Does some scaling comparison between MinMaxScaler and StandardScaler too.
The dataset used is related to various indicators of countries, such as child mortality, exports, health spending, inflation, GDP, and more. The aim is to gain insights into how different countries are grouped based on these factors.
You can find the dataset here on Kaggle.
- K-Means Clustering: A partition-based clustering algorithm that divides the dataset into k clusters.
- DBSCAN: A density-based clustering algorithm that groups data points based on density.
Before running the project, ensure you have the following tools installed:
- Python (version 3.8 or higher)
- Poetry (for managing dependencies)
Follow these steps to clone the repository and set up the environment:
- Clone the Repository
Clone this repository to your local machine by running the following command:
git clone https://github.com/yourusername/CCADMACL_PROJECT_COM222.git
- Navigate to the Project Directory
cd CCADMACL_PROJECT_COM222
- Install Dependencies using Poetry
Install the dependencies listed in the pyproject.toml file by running:
poetry shell