This repository contains implementations of the Naive Bayes and K-Nearest Neighbors (KNN) algorithms from scratch in Python using Numpy and Pandas for the task of classification.
In this project, we implement two classical supervised learning algorithms: Naive Bayes and K-Nearest Neighbors (KNN). These algorithms are widely used for classification tasks in machine learning. The purpose of this project is to provide a clear understanding of how these algorithms work and to demonstrate their implementation from scratch using Python.
- Numpy: For numerical computing.
- Pandas: For data manipulation and analysis.
- Matplotlib: For data visualization.
The provided data is preprocessed before applying the classification algorithms. This includes dropping unnecessary columns, filling missing values, and converting categorical variables into numerical ones.
- The Naive Bayes classifier is implemented to predict the class labels for the given test data.
- It computes the class probabilities and the mean and variance for each feature and class.
- The predicted labels are based on the maximum posterior probability.
- The KNN classifier is implemented to predict the class labels for the given test data.
- It calculates the distances between the test samples and the training samples.
- The predicted labels are based on the majority class among the k nearest neighbors.
-
Clone the repository:
git clone https://github.com/your_username/supervised-learning-classification.git
-
Install the required libraries:
pip install numpy pandas matplotlib
-
Run the Python script:
jupyter notebook FSM_SUPERVISED1.ipynb
main.py: Contains the main script for preprocessing data and applying classification algorithms.train.csv: Training dataset.test.csv: Test dataset.gender_submission.csv: Sample submission file for test dataset.
- The accuracy of Naive Bayes classifier: 63.6%
- The accuracy of KNN classifier: 65.6%
Contributions are welcome! If you find any issues or have suggestions for improvements, feel free to open an issue or create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.