Skip to content

Statistical Methods for Machine Learning project - Muffins Vs Chihuahuas

License

Notifications You must be signed in to change notification settings

Sabaudian/Binary_Image_Classification_project

Repository files navigation

Statistical Methods for Machine Learning: experimental project

Pycharm Badge Pyton Badge Tensorflow Badge Keras Badge

Use Keras to train a neural network for the binary classification of muffins and Chihuahuas.

General Information

  • Python version is: 3.10.5 (click on the badge)
  • requirements.txt contains all the necessary python packages (use the command below to install all the pachages)
pip install -r requirements.txt
  • The file called "SMML_Project_Report" is the document describing the project

Structure of the project

the architecture of this project is fundamentally organized into four blocks. The initial two blocks are dedicated to preprocessing and data preparation tasks, whereas the latter two blocks are focused on model construction: classification and evaluation.

A. Preprocessing

In the preprocessing phase, the emphasis is on refining the dataset. The process involves systematically addressing corrupted files, detecting and managing duplicates through image hashing, and conducting a thorough dataset check.

B. Data Preparation

In the data preparation phase, the primary focus is on loading and enhancing the dataset. This involves using Keras and TensorFlow to load training, validation, and test datasets, applying data augmentation techniques such as flip, rotation, and zoom, and normalizing pixel values. The goal is to ensure the dataset is well-prepared and suitable for subsequent steps.

C. Classification

In the classification phase, a robust image classification pipeline is established using Keras and TensorFlow. The implementation introduces configurable models, including Multilayer Perceptron, Convolutional Neural Network and MobileNet. The workflow seamlessly integrates hyperparameter tuning and K-fold cross-validation for comprehensive model optimization.

D. Evaluation

In the evaluation phase, the model’s performance is tested through the presentation of insightful metrics, such as loss and accuracy. The module further generates classification reports, produces confusion matrices, and offers intuitive plots to analyze predictions.

Performace Summary:

MLP CNN MOBILENET
Accuracy (%) 71.537 94.510 99.493
Loss 0.573 0.222 0.019

The models exhibit varying degrees of performance, with MobileNet emerging as the standout performer, achieving near-perfect accuracy and classification proficiency. The CNN model also demonstrates notable results. The MLP model performs worse than its counterparts, exhibiting suboptimal performance characterized by higher loss resulting in a notable rate of misclassification (underfitting).