Skip to content

ZiadTawfiq/Material_Classification_ML

Repository files navigation

Material Stream Identification System

The efficient and automated sorting of post-consumer waste is a critical bottleneck in achieving circular economy goals. This project has been a challenge to develop an Automated Material Stream Identification (MSI) System using fundamental Machine Learning (ML) techniques. It emphasizes mastery of the entire ML pipeline: Data Preprocessing, Feature Extraction, Classifier Training, and Performance Evaluation.

Our dataset zip file Garbage Classification

Data Augmentation and Feature Extraction:

To decide whether data augmentation is needed, we check if an imbalance exists in the dataset, since each class in our dataset had images between 200 ~ 400, the imbalance was checked if images count differ by 100 images, if the dataset was much larger the threshold would be consistent with the dataset size.

First off we start by splitting the dataset into (80% train - 20% test), then we perform augmentation on the training set. The test set were eventually 392 images.

DataAugmentor Class

Images are randomly selected from each class for augmentation. For each selected image, nine augmentation techniques are applied, and the resulting images are stored in the ‘augmented’ directory

FeatureExtractor Class

The MobileNetV2 CNN model was used for feature extraction, along with three methods specific to our dataset:

  • Local Binary Pattern (LBP), which extracts texture features.
  • Canny edge detector, applied specifically for the trash class.
  • Color statistics, applied specifically for the plastic class.”

Augment Extract Pipeline

After loading the images and splitting the dataset, we perform augmentation and feature extraction on X_train and y_train, while only extracting features from X_test and y_test. The test set is stored in the "dataset_split" directory. All feature arrays and corresponding labels are saved in the "features" directory, which are later used by each machine learning model.

KNN, SVM Models

The KNN and SVM models were trained and tested using the previously stored feature arrays. Both trained models are saved in the models directory for future use.

for both models, After loading the features, they were scaled using a standard scaler. Principal Component Analysis (PCA) was then applied for dimensionality reduction, reducing the feature size from 1,314 to 217 dimensions.

Each model was evaluated on both the training and test sets, and performance metrics including accuracy, classification report, and confusion matrix were generated.

KNN

Pipeline:

  • Cross-validation with 7 folds was performed over k values ranging from 3 to 21. The KNN model used the following parameters: algorithm='auto', weights='distance', and metric='cosine', which is preferred for image classification tasks.

  • A predict-with-rejection method was implemented using a confidence threshold of 0.6. If the model’s confidence for a prediction is below this threshold, the sample is assigned to an Unknown class.

Performance:

  • rejection rate: 3.49%
  • Accuracy with rejection: 0.8525
  • Accuracy without rejection: 0.8713

SVM

Pipeline:

  • Class labels were encoded using LabelEncoder, and the training data was shuffled to ensure randomness.

  • An SVM with RBF kernel was trained using GridSearchCV for hyperparameter tuning (C and gamma) with 5-fold cross-validation.

Performance:

  • Best CV score: 0.9578
  • Training accuracy: 0.9997
  • Testing accuracy: 0.9035

We concluded that the SVM was the best-performing model. It is used in the "test.py" script, where any set of images can be loaded by specifying the dataset path. The trained SVM model is then tested on these images. A confidence threshold of 0.5 is applied to implement the Unknown class: if the model’s prediction confidence is below this threshold, the sample is assigned to the Unknown class; otherwise, the predicted label is retrieved from the LabelEncoder.

Real-time Camera

We load the trained SVM model and MobileNetV2 CNN model with other feature extraction methods explained before. SVM predicts class probabilities, If the highest probability is below confidence threshold (0.6), the frame is labeled Unknown; otherwise, the predicted class is used. It displays the predicted material and confidence percentage on the frame.

Getting Started

  1. Clone the Repository
    git clone https://github.com/ZiadTawfiq/Material_Classification_ML.git
    cd Material_Classification_ML

setup your dataset in the same directory.

  1. Setup Virtual Enviroment
    python -m venv venv
    venv\Scripts\activate #For Windows
    source venv/bin/activate  # Linux/Mac
    
  2. Install Dependencies
    pip install --upgrade pip
    pip install -r requirements.txt
    
  3. Run the augment_extract_pipeline
    python .\augment_extract_pipeline.py
    
  4. Run either knn model or svm
    python .\knn_classifier.py
    python .\svm_classifier.py
    
  5. Run deploy camera
    python .\deploy_camera.py        

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages