This repository provides a pipeline for fine-tuning a Vision Transformer (ViT) model on custom image datasets using Hugging Face's Transformers library. The code is designed to offer flexibility in dataset management, model fine-tuning, and inference, making it easy to adapt the ViT model to various image classification tasks
git clone https://github.com/semihdervis/ViT-Image-Classification-with-Any-Images.git
cd ViT-Image-Classification-with-Any-Images
Ensure you have Python 3.8+ installed. Install the necessary packages using pip
:
pip install -r requirements.txt
-
Set Dataset and Output Directory:
- Replace
DATASET_PATH
intrain.py
with the path to your image dataset. - Set
OUTPUT_DIR
to your desired model output directory.
- Replace
-
Run Training:
python train.py
-
Set Model and Image Paths:
- In
test_model_with_single_image.py
, replaceMODEL_PATH
with the path to your trained model. - Replace
IMAGE_PATH
with the path to the image you want to classify.
- In
-
Run the Inference Script:
python test_model_with_single_image.py
-
Set Model Path:
- In
test_model_with_video_capture.py
, replaceMODEL_PATH
with the path to your trained model.
- In
-
Run the Video Capture Script:
python test_model_with_video_capture.py
This project is licensed under the MIT License.