This repository contains code for an example Machine Learning (ML) project in plant health classification, more specifically, building and training a simple ML model to classify healthy and unhealthy apple plant leaves. You can use this project as an example to help you build your own ML model on a dataset related to plant or animal health of your choice.
The data from this example was retrieved from kaggle: https://www.kaggle.com/skeef79/plant-pathology-more-data-no-background
To run this code, you need to have a version of python 3 up and running. This code was tested using python version 3.8.3. Python libraries including matplotlib, scikit-learn, imutils, numpy, cv2 and pandas must be installed in your python environment.
You can install these libraries using pip install
.
- matplotlib 3.2.1
- imutils 0.5.3
- scikit-learn 0.22.1
- numpy 1.18.1
- cv2 (opencv) 4.0.1
- pandas 1.0.4
Download all the files from the zip folder in this repository. Open a terminal and cd to the project folder (if you are using a python environment, make sure to activate it before). You can then open the jupyter notebook and run the cells during the tutorial.
git clone https://github.com/bioinfoUQAM/datasets_for_ML/
cd datasets_for_ML
python jupyter notebook
Then navigate to the jupyter notebook Plant_Pathology_ML_Tutorial.ipynb using the Notebook Dashboard and click on it to open.
You can use this code to create a ML classifier on other existing datasets. Some examples include:
- The Flowers-17 dataset. An open-source dataset containing 17 categories of flowers with 80 images for each class.
- The Plant Seedlings dataset which contains images of seedlings from 12 different plant species.
- The BeeImage dataset dataset contains 5,100+ bee images annotated with location, date, time, subspecies, health condition, caste, and pollen.
For any questions or concerns regarding this tutorial, please contact amanda.boatswainj@gmail.com