Skip to content

A TrOCR-small-printed model fine-tuned on 419,880 CAPTCHAs from the YZU Course Selection System.

License

Notifications You must be signed in to change notification settings

sunsun8170/YZU-CAPTCHA-TrOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YZU CAPTCHA TrOCR

Introduction

This project is part of the YZU Course Bot initiative. It fine-tunes a TrOCR-small-printed model on 419,880 captchas collected from the YZU Course Selection System and is trained on a desktop with an NVIDIA GeForce RTX 4090 GPU featuring 24 GB of VRAM. The top-performing model was preserved for future use in the YZU Course Bot, facilitating automatic captcha recognition during system login.

Environment

Below are the steps to set up the environment.

conda create -n env_name -c conda-forge python=3.12
conda activate env_name
cd path/to/YZU-CAPTCHA-TrOCR-main
pip install -r requirements.txt

Below is the platform used in this study.

Desktop
GPU NVIDIA GeForce RTX 4090
CPU 12th Gen Intel(R) Core(TM) i9-12900K (24) @ 5.20 GHz
RAM 64 GB
OS Ubuntu noble 24.04 x86_64

Usage

This project contains all the code for data collection, preprocessing, training, and testing. To run each process sequentially or individually, follow the instructions below.

python main.py [-h] [-d] [-p] [-t] [-s] [-i]
parameters description
None Runs preprocessing, training, and testing sequentially.
-h, --help Displays this help message and exits.
-d, --dataset Collects CAPTCHA images from the YZU Course Selection System.
-p, --preprocess Executes the preprocessing step.
-t, --train Executes the training step.
-s, --test Executes the testing step.
-i, --info Displays the model architecture, total and trainable parameters information.

Results of Each Process

Dataset

We collected a total of 419,880 CAPTCHA images from the YZU Course Selection System to be used as the dataset for later processing.

To obtain the dataset, download the captcha_imgs.zip file from the Releases page and place it in the same directory as main.py.

Preprocess

The dataset was splitted into train, evaluation, and test sets using a 7:1.5:1.5 ratio.

dataset ratio images
train 0.7 293,916
evaluation 0.15 62,982
test 0.15 62,982
TOTAL 1 419,880

Train & Evaluation

The TrOCR-small-printed model was fine-tuned on a training set of 293,916 CAPTCHA images and evaluated on an evaluation set of 62,982 CAPTCHA images.

The Character Error Rate (CER) metric is used to determine the model's best performance, after which the model was saved. A lower CER indicates better model performance.

train_loss train_learning_rate
eval_loss eval_cer

To access the full training results, download the results.zip file from the Releases page and place it in the same directory as main.py. Then you may start a TensorBoard session by running the following command in your terminal.

tensorboard --logdir=./results/train

Test

The top-performing model was saved and tested on the test set containing 62,982 CAPTCHA images.

acc_report

Conclusion

The final results demonstrated that this fine-tuned TrOCR-small-printed model for recognizing CAPTCHA images on the YZU Course Selection System achieved an accuracy of 99.97%.

References

  1. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
  2. TrOCR - Hugging Face
  3. TrOCR – Getting Started with Transformer Based OCR
  4. Fine Tuning TrOCR – Training TrOCR to Recognize Curved Text
  5. Google Python Style Guide

Contact me

Feel free to reach out to me at s1101613@mail.yzu.edu.tw

About

A TrOCR-small-printed model fine-tuned on 419,880 CAPTCHAs from the YZU Course Selection System.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages