Material Stream Identification System

The efficient and automated sorting of post-consumer waste is a critical bottleneck in achieving circular economy goals. This project has been a challenge to develop an Automated Material Stream Identification (MSI) System using fundamental Machine Learning (ML) techniques. It emphasizes mastery of the entire ML pipeline: Data Preprocessing, Feature Extraction, Classifier Training, and Performance Evaluation.

Our dataset zip file Garbage Classification

Data Augmentation and Feature Extraction:

To decide whether data augmentation is needed, we check if an imbalance exists in the dataset, since each class in our dataset had images between 200 ~ 400, the imbalance was checked if images count differ by 100 images, if the dataset was much larger the threshold would be consistent with the dataset size.

First off we start by splitting the dataset into (80% train - 20% test), then we perform augmentation on the training set. The test set were eventually 392 images.

DataAugmentor Class

Images are randomly selected from each class for augmentation. For each selected image, nine augmentation techniques are applied, and the resulting images are stored in the ‘augmented’ directory

FeatureExtractor Class

The MobileNetV2 CNN model was used for feature extraction, along with three methods specific to our dataset:

Local Binary Pattern (LBP), which extracts texture features.
Canny edge detector, applied specifically for the trash class.
Color statistics, applied specifically for the plastic class.”

Augment Extract Pipeline

After loading the images and splitting the dataset, we perform augmentation and feature extraction on X_train and y_train, while only extracting features from X_test and y_test. The test set is stored in the "dataset_split" directory. All feature arrays and corresponding labels are saved in the "features" directory, which are later used by each machine learning model.

KNN, SVM Models

The KNN and SVM models were trained and tested using the previously stored feature arrays. Both trained models are saved in the models directory for future use.

for both models, After loading the features, they were scaled using a standard scaler. Principal Component Analysis (PCA) was then applied for dimensionality reduction, reducing the feature size from 1,314 to 217 dimensions.

Each model was evaluated on both the training and test sets, and performance metrics including accuracy, classification report, and confusion matrix were generated.

KNN

Pipeline:

Cross-validation with 7 folds was performed over k values ranging from 3 to 21. The KNN model used the following parameters: algorithm='auto', weights='distance', and metric='cosine', which is preferred for image classification tasks.
A predict-with-rejection method was implemented using a confidence threshold of 0.6. If the model’s confidence for a prediction is below this threshold, the sample is assigned to an Unknown class.

Performance:

rejection rate: 3.49%
Accuracy with rejection: 0.8525
Accuracy without rejection: 0.8713

SVM

Pipeline:

Class labels were encoded using LabelEncoder, and the training data was shuffled to ensure randomness.
An SVM with RBF kernel was trained using GridSearchCV for hyperparameter tuning (C and gamma) with 5-fold cross-validation.

Performance:

Best CV score: 0.9578
Training accuracy: 0.9997
Testing accuracy: 0.9035

We concluded that the SVM was the best-performing model. It is used in the "test.py" script, where any set of images can be loaded by specifying the dataset path. The trained SVM model is then tested on these images. A confidence threshold of 0.5 is applied to implement the Unknown class: if the model’s prediction confidence is below this threshold, the sample is assigned to the Unknown class; otherwise, the predicted label is retrieved from the LabelEncoder.

Real-time Camera

We load the trained SVM model and MobileNetV2 CNN model with other feature extraction methods explained before. SVM predicts class probabilities, If the highest probability is below confidence threshold (0.6), the frame is labeled Unknown; otherwise, the predicted class is used. It displays the predicted material and confidence percentage on the frame.

Getting Started

Clone the Repository

git clone https://github.com/ZiadTawfiq/Material_Classification_ML.git
cd Material_Classification_ML

setup your dataset in the same directory.

Setup Virtual Enviroment

python -m venv venv
venv\Scripts\activate #For Windows
source venv/bin/activate  # Linux/Mac

Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Run the augment_extract_pipeline
```
python .\augment_extract_pipeline.py
```

Run either knn model or svm

python .\knn_classifier.py
python .\svm_classifier.py

Run deploy camera
```
python .\deploy_camera.py        
```

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
models		models
.gitignore		.gitignore
MSI_report.pdf		MSI_report.pdf
README.md		README.md
augment_extract_pipeline.py		augment_extract_pipeline.py
augmentor.py		augmentor.py
deploy_camera.py		deploy_camera.py
feature_extractor.py		feature_extractor.py
knn_classifier.py		knn_classifier.py
requirements.txt		requirements.txt
svm_classifier.py		svm_classifier.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Material Stream Identification System

Data Augmentation and Feature Extraction:

DataAugmentor Class

FeatureExtractor Class

Augment Extract Pipeline

KNN, SVM Models

KNN

SVM

Real-time Camera

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Material Stream Identification System

Data Augmentation and Feature Extraction:

DataAugmentor Class

FeatureExtractor Class

Augment Extract Pipeline

KNN, SVM Models

KNN

SVM

Real-time Camera

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages