This project is part of the YZU Course Bot initiative. It fine-tunes a TrOCR-small-printed model on 419,880 captchas collected from the YZU Course Selection System and is trained on a desktop with an NVIDIA GeForce RTX 4090 GPU featuring 24 GB of VRAM. The top-performing model was preserved for future use in the YZU Course Bot, facilitating automatic captcha recognition during system login.
Below are the steps to set up the environment.
conda create -n env_name -c conda-forge python=3.12
conda activate env_name
cd path/to/YZU-CAPTCHA-TrOCR-main
pip install -r requirements.txt
Below is the platform used in this study.
Desktop | |
---|---|
GPU | NVIDIA GeForce RTX 4090 |
CPU | 12th Gen Intel(R) Core(TM) i9-12900K (24) @ 5.20 GHz |
RAM | 64 GB |
OS | Ubuntu noble 24.04 x86_64 |
This project contains all the code for data collection, preprocessing, training, and testing. To run each process sequentially or individually, follow the instructions below.
python main.py [-h] [-d] [-p] [-t] [-s] [-i]
parameters | description |
---|---|
None | Runs preprocessing, training, and testing sequentially. |
-h, --help |
Displays this help message and exits. |
-d, --dataset |
Collects CAPTCHA images from the YZU Course Selection System. |
-p, --preprocess |
Executes the preprocessing step. |
-t, --train |
Executes the training step. |
-s, --test |
Executes the testing step. |
-i, --info |
Displays the model architecture, total and trainable parameters information. |
We collected a total of 419,880 CAPTCHA images from the YZU Course Selection System to be used as the dataset for later processing.
To obtain the dataset, download the captcha_imgs.zip
file from the Releases page and place it in the same directory as main.py
.
The dataset was splitted into train, evaluation, and test sets using a 7:1.5:1.5 ratio.
dataset | ratio | images |
---|---|---|
train | 0.7 | 293,916 |
evaluation | 0.15 | 62,982 |
test | 0.15 | 62,982 |
TOTAL | 1 | 419,880 |
The TrOCR-small-printed model was fine-tuned on a training set of 293,916 CAPTCHA images and evaluated on an evaluation set of 62,982 CAPTCHA images.
The Character Error Rate (CER) metric is used to determine the model's best performance, after which the model was saved. A lower CER indicates better model performance.
To access the full training results, download the results.zip
file from the Releases page and place it in the same directory as main.py
. Then you may start a TensorBoard session by running the following command in your terminal.
tensorboard --logdir=./results/train
The top-performing model was saved and tested on the test set containing 62,982 CAPTCHA images.
The final results demonstrated that this fine-tuned TrOCR-small-printed model for recognizing CAPTCHA images on the YZU Course Selection System achieved an accuracy of 99.97%.
- TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
- TrOCR - Hugging Face
- TrOCR – Getting Started with Transformer Based OCR
- Fine Tuning TrOCR – Training TrOCR to Recognize Curved Text
- Google Python Style Guide
Feel free to reach out to me at s1101613@mail.yzu.edu.tw