The Intelligent Data Classifier is a robust machine learning model using K-Nearest Neighbors (KNN) and Decision Trees to perform multi-label classification. This project demonstrates the application of fundamental algorithms developed from scratch, aimed at achieving high accuracy in complex label prediction tasks through meticulous hyperparameter tuning.
- K-Nearest Neighbors (KNN): Custom implementation of the KNN algorithm, allowing for adjustable parameters such as the number of neighbors and distance metrics.
- Decision Trees: Utilizes both Powerset and MultiOutput formulations to address complex classification scenarios.
- Hyperparameter Tuning: Detailed optimization process to enhance model performance.
- Data Analysis: Extensive exploratory data analysis with visualizations to understand data distributions and relationships.
- Python
- Jupyter Notebook
- NumPy
- Matplotlib
- Bash Scripting
Clone this repository:
git clone https://github.com/yourusername/intelligent-data-classifier.git
Navigate to the project directory:
cd intelligent-data-classifier
Install the required dependencies:
pip install -r requirements.txt
Run the Jupyter Notebooks to explore the dataset and model implementation:
jupyter notebook
Execute the bash script to test the model with new data:
bash eval.sh