(Added Wednesday Apr 9th, 2025)
- Jacob A Rose forked this repo from the starter repo originally created by @murilogustineli, for the purpose of improving on their work to compete in the 2025 PlantCLEF dataset challenge.
PyTorch webinar on using the DINOv2 model and the Faiss library for multi-label plant species classification in the PlantCLEF @ LifeCLEF & CVPR-FGVC competition on Kaggle. This session will demonstrate how self-supervised Vision Transformers (ViTs) and similarity search techniques can classify plant species efficiently at scale.
This webinar is made possible through the support of the PyTorch Foundation and Intel AI.
Discover how we used DINOv2 + Faiss leveraging PyTorch and PyTorch Lightning for large-scale multi-label plant species classification.
Table of Contents
- How to leverage DINOv2 embeddings for multi-label classification using transfer learning.
- Efficient feature extraction from a subset of 1.4M+ images using PyTorch Lightning.
- Using Faiss for fast nearest neighbor search on high-dimensional embeddings.
- Image processing techniques: grid-based tiling and prediction aggregation to handle large datasets.
📅 Date: March 27th, 12 PM PST
🎤 Speaker: Murilo Gustineli
📍 Where: Online Webinar
👋 Register today: Registration Page
First, clone the pytorch-plantclef repo:
git clone https://github.com/murilogustineli/pytorch-plantclef.gitUsing SSH:
git clone git@github.com:murilogustineli/pytorch-plantclef.gitNavigate to the project directory:
cd pytorch-plantclefInstall uv as the package manager for the project. Follow the uv installation instructions for macOS, Linux, and Windows.
If running on Intel Tiber AI Cloud, install uv as the following (also works for macOS and Linux):
curl -LsSf https://astral.sh/uv/install.sh | shAdd it to PATH:
source $HOME/.local/bin/envCheck uv installation:
uv --versionCreate the virtual environment:
uv venv venvActivate the virtual environment:
source venv/bin/activateInstall the plantclef package in editable mode, which means changes to the Python files will be immediately available without needing to reinstall the package.
Install all dependencies from requirements.txt to the venv virtual environment:
uv pip install -e .This command does two things:
- Installs all dependencies listed in
requirements.txt. - Sets up
plantclefas an editable package inside the virtual environment.
To ensure code follows best practices, install pre-commit:
pre-commit installThis automatically formats and checks your code before every commit.
Run the following script to download:
- Dataset (
data/parquet/dataset_name) - Fine-Tuned DINOv2 Model (
model/pretrained_models/model_name)
bash scripts/download_data_model.shThis script will:
- Download the dataset & model from Google Drive.
- Extract the
.zipfiles into their respective directories. - Remove the original
.zipfiles to save space.
After downloading the data and fine-tuned model, we can ensure everything is working correctly by running the following pystest:
pytest -vv -s tests/test_model.pyThis test ensures that:
- The virtual environment is correctly set up.
- The DINOv2 model is correctly loaded.
- Image embeddings are generated without errors.
If you're running locally, you should be good to go! If you're running on Intel Tiber AI Cloud, follow the setup below.
To ensure proper Intel GPU (xpu) access, follow these steps:
-
Open the notebook: Open the jupyter notebook
notebooks/setup_itac.ipynb. -
Run cells sequentially: Go through the notebook step by step.
-
Restart the Kernel when required: Running the cell
exit()will restart the jupyter kernel to apply the installations. -
Verify that the Intel GPU (
xpu) is being used: At the end of the notebook execution, check the PyTorch version and device are correct. The expect output if Intel GPU is enabled is:PyTorch Version: 2.5.1+cxx11.abi Using device: xpuIf you see
Using device: cpu, the setup did not correctly enable the Intel GPU—retry running the setup notebook.


