Skip to content

Training (fine-tuning) and evaluating Vision Transformer (ViT) models on any image dataset

License

Notifications You must be signed in to change notification settings

semihdervis/ViT-Image-Classification-with-Any-Images

Repository files navigation

ViT-Image-Classification-with-Any-Images

Overview

This repository provides a pipeline for fine-tuning a Vision Transformer (ViT) model on custom image datasets using Hugging Face's Transformers library. The code is designed to offer flexibility in dataset management, model fine-tuning, and inference, making it easy to adapt the ViT model to various image classification tasks

Setup

Clone the Repository

git clone https://github.com/semihdervis/ViT-Image-Classification-with-Any-Images.git
cd ViT-Image-Classification-with-Any-Images

Install Requirements

Ensure you have Python 3.8+ installed. Install the necessary packages using pip:

pip install -r requirements.txt

Usage

Training the Model

  1. Set Dataset and Output Directory:

    • Replace DATASET_PATH in train.py with the path to your image dataset.
    • Set OUTPUT_DIR to your desired model output directory.
  2. Run Training:

    python train.py

Testing the Model with a Single Image

  1. Set Model and Image Paths:

    • In test_model_with_single_image.py, replace MODEL_PATH with the path to your trained model.
    • Replace IMAGE_PATH with the path to the image you want to classify.
  2. Run the Inference Script:

    python test_model_with_single_image.py

Testing the Model with Video Capture

  1. Set Model Path:

  2. Run the Video Capture Script:

    python test_model_with_video_capture.py

License

This project is licensed under the MIT License.

About

Training (fine-tuning) and evaluating Vision Transformer (ViT) models on any image dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages