Skip to content

A list of projects and datasets for practicing machine learning (ML).

License

Notifications You must be signed in to change notification settings

bioinfoUQAM/datasets_for_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datasets for ML

This repository contains code for an example Machine Learning (ML) project in plant health classification, more specifically, building and training a simple ML model to classify healthy and unhealthy apple plant leaves. You can use this project as an example to help you build your own ML model on a dataset related to plant or animal health of your choice.

Input

The data from this example was retrieved from kaggle: https://www.kaggle.com/skeef79/plant-pathology-more-data-no-background

Configuration

To run this code, you need to have a version of python 3 up and running. This code was tested using python version 3.8.3. Python libraries including matplotlib, scikit-learn, imutils, numpy, cv2 and pandas must be installed in your python environment.

Python version

Other libraries

You can install these libraries using pip install.

  • matplotlib 3.2.1
  • imutils 0.5.3
  • scikit-learn 0.22.1
  • numpy 1.18.1
  • cv2 (opencv) 4.0.1
  • pandas 1.0.4

Running this Code

Download all the files from the zip folder in this repository. Open a terminal and cd to the project folder (if you are using a python environment, make sure to activate it before). You can then open the jupyter notebook and run the cells during the tutorial.

git clone https://github.com/bioinfoUQAM/datasets_for_ML/

cd datasets_for_ML

python jupyter notebook

Then navigate to the jupyter notebook Plant_Pathology_ML_Tutorial.ipynb using the Notebook Dashboard and click on it to open.

Additional Datasets

You can use this code to create a ML classifier on other existing datasets. Some examples include:

  1. The Flowers-17 dataset. An open-source dataset containing 17 categories of flowers with 80 images for each class.
  2. The Plant Seedlings dataset which contains images of seedlings from 12 different plant species.
  3. The BeeImage dataset dataset contains 5,100+ bee images annotated with location, date, time, subspecies, health condition, caste, and pollen.

Contact

For any questions or concerns regarding this tutorial, please contact amanda.boatswainj@gmail.com

About

A list of projects and datasets for practicing machine learning (ML).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published