YZU CAPTCHA TrOCR

Introduction

This project is part of the YZU Course Bot initiative. It fine-tunes a TrOCR-small-printed model on 419,880 captchas collected from the YZU Course Selection System and is trained on a desktop with an NVIDIA GeForce RTX 4090 GPU featuring 24 GB of VRAM. The top-performing model was preserved for future use in the YZU Course Bot, facilitating automatic captcha recognition during system login.

Environment

Below are the steps to set up the environment.

conda create -n env_name -c conda-forge python=3.12
conda activate env_name
cd path/to/YZU-CAPTCHA-TrOCR-main
pip install -r requirements.txt

Below is the platform used in this study.

	Desktop
GPU	NVIDIA GeForce RTX 4090
CPU	12th Gen Intel(R) Core(TM) i9-12900K (24) @ 5.20 GHz
RAM	64 GB
OS	Ubuntu noble 24.04 x86_64

Usage

This project contains all the code for data collection, preprocessing, training, and testing. To run each process sequentially or individually, follow the instructions below.

python main.py [-h] [-d] [-p] [-t] [-s] [-i]

parameters	description
None	Runs preprocessing, training, and testing sequentially.
`-h, --help`	Displays this help message and exits.
`-d, --dataset`	Collects CAPTCHA images from the YZU Course Selection System.
`-p, --preprocess`	Executes the preprocessing step.
`-t, --train`	Executes the training step.
`-s, --test`	Executes the testing step.
`-i, --info`	Displays the model architecture, total and trainable parameters information.

Results of Each Process

Dataset

We collected a total of 419,880 CAPTCHA images from the YZU Course Selection System to be used as the dataset for later processing.

To obtain the dataset, download the captcha_imgs.zip file from the Releases page and place it in the same directory as main.py.

Preprocess

The dataset was splitted into train, evaluation, and test sets using a 7:1.5:1.5 ratio.

dataset	ratio	images
train	0.7	293,916
evaluation	0.15	62,982
test	0.15	62,982
TOTAL	1	419,880

Train & Evaluation

The TrOCR-small-printed model was fine-tuned on a training set of 293,916 CAPTCHA images and evaluated on an evaluation set of 62,982 CAPTCHA images.

The Character Error Rate (CER) metric is used to determine the model's best performance, after which the model was saved. A lower CER indicates better model performance.

To access the full training results, download the results.zip file from the Releases page and place it in the same directory as main.py. Then you may start a TensorBoard session by running the following command in your terminal.

tensorboard --logdir=./results/train

Test

The top-performing model was saved and tested on the test set containing 62,982 CAPTCHA images.

Conclusion

The final results demonstrated that this fine-tuned TrOCR-small-printed model for recognizing CAPTCHA images on the YZU Course Selection System achieved an accuracy of 99.97%.

References

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
TrOCR - Hugging Face
TrOCR – Getting Started with Transformer Based OCR
Fine Tuning TrOCR – Training TrOCR to Recognize Curved Text
Google Python Style Guide

Contact me

Feel free to reach out to me at s1101613@mail.yzu.edu.tw

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
partial_results		partial_results
src		src
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YZU CAPTCHA TrOCR

Introduction

Environment

Usage

Results of Each Process

Dataset

Preprocess

Train & Evaluation

Test

Conclusion

References

Contact me

About

Releases 1

Packages

Languages

License

sunsun8170/YZU-CAPTCHA-TrOCR

Folders and files

Latest commit

History

Repository files navigation

YZU CAPTCHA TrOCR

Introduction

Environment

Usage

Results of Each Process

Dataset

Preprocess

Train & Evaluation

Test

Conclusion

References

Contact me

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages