Skip to content

JacobARose/plantclef-vision

Repository files navigation

PyTorch PlantCLEF: Multi-label Plant Species Classification with DINOv2

plantclef-2025-banner

(Added Wednesday Apr 9th, 2025)

  • Jacob A Rose forked this repo from the starter repo originally created by @murilogustineli, for the purpose of improving on their work to compete in the 2025 PlantCLEF dataset challenge.

PyTorch webinar on using the DINOv2 model and the Faiss library for multi-label plant species classification in the PlantCLEF @ LifeCLEF & CVPR-FGVC competition on Kaggle. This session will demonstrate how self-supervised Vision Transformers (ViTs) and similarity search techniques can classify plant species efficiently at scale.

diagram

This webinar is made possible through the support of the PyTorch Foundation and Intel AI.

Watch the Webinar on YouTube

▶️ Click here to watch on YouTube

Discover how we used DINOv2 + Faiss leveraging PyTorch and PyTorch Lightning for large-scale multi-label plant species classification.

Table of Contents
  1. What You'll Learn
  2. Event Details
  3. Quickstart Guide
  4. Intel Tiber AI Cloud Setup

What You’ll Learn

  • How to leverage DINOv2 embeddings for multi-label classification using transfer learning.
  • Efficient feature extraction from a subset of 1.4M+ images using PyTorch Lightning.
  • Using Faiss for fast nearest neighbor search on high-dimensional embeddings.
  • Image processing techniques: grid-based tiling and prediction aggregation to handle large datasets.

Event Details

📅 Date: March 27th, 12 PM PST

🎤 Speaker: Murilo Gustineli

📍 Where: Online Webinar

👋 Register today: Registration Page

Quickstart Guide

1. Clone the repository

First, clone the pytorch-plantclef repo:

⚠️ Using HTTPS (Recommended for Intel Tiber AI Cloud):

git clone https://github.com/murilogustineli/pytorch-plantclef.git

Using SSH:

git clone git@github.com:murilogustineli/pytorch-plantclef.git

Navigate to the project directory:

cd pytorch-plantclef

2. Install uv (Fast Package Manager)

Install uv as the package manager for the project. Follow the uv installation instructions for macOS, Linux, and Windows.

If running on Intel Tiber AI Cloud, install uv as the following (also works for macOS and Linux):

curl -LsSf https://astral.sh/uv/install.sh | sh

Add it to PATH:

source $HOME/.local/bin/env

Check uv installation:

uv --version

3. Create a Virtual Environment

Create the virtual environment:

uv venv venv

Activate the virtual environment:

source venv/bin/activate

4. Install Dependencies and Set Up the Project

Install the plantclef package in editable mode, which means changes to the Python files will be immediately available without needing to reinstall the package.

Install all dependencies from requirements.txt to the venv virtual environment:

uv pip install -e .

This command does two things:

  1. Installs all dependencies listed in requirements.txt.
  2. Sets up plantclef as an editable package inside the virtual environment.

[OPTIONAL] Set Up Pre-Commit Hooks for Code Formatting:

To ensure code follows best practices, install pre-commit:

pre-commit install

This automatically formats and checks your code before every commit.

5. Download Dataset & Fine-Tuned ViT Model

Run the following script to download:

  • Dataset (data/parquet/dataset_name)
  • Fine-Tuned DINOv2 Model (model/pretrained_models/model_name)
bash scripts/download_data_model.sh

This script will:

  • Download the dataset & model from Google Drive.
  • Extract the .zip files into their respective directories.
  • Remove the original .zip files to save space.

6. Run tests to verify setup

After downloading the data and fine-tuned model, we can ensure everything is working correctly by running the following pystest:

pytest -vv -s tests/test_model.py

This test ensures that:

  • The virtual environment is correctly set up.
  • The DINOv2 model is correctly loaded.
  • Image embeddings are generated without errors.

If you're running locally, you should be good to go! If you're running on Intel Tiber AI Cloud, follow the setup below.

Intel Tiber AI Cloud Setup

⚠️ The Jupyter and terminal environments on ITAC are NOT synced. This means that installing packages or setting environment variables in one will not automatically apply to the other.

To ensure proper Intel GPU (xpu) access, follow these steps:

  1. Open the notebook: Open the jupyter notebook notebooks/setup_itac.ipynb.

  2. Run cells sequentially: Go through the notebook step by step.

  3. Restart the Kernel when required: Running the cell exit() will restart the jupyter kernel to apply the installations.

  4. Verify that the Intel GPU (xpu) is being used: At the end of the notebook execution, check the PyTorch version and device are correct. The expect output if Intel GPU is enabled is:

    PyTorch Version: 2.5.1+cxx11.abi
    Using device: xpu
    

    If you see Using device: cpu, the setup did not correctly enable the Intel GPU—retry running the setup notebook.

About

PyTorch vision workflows using DINOv2 and Faiss for plant species classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages