A PyTorch-based system for recognizing Hei.Captcha GIF captchas, developed for security research purposes.
This project implements an efficient CNN model to recognize alphanumeric captchas (A-Z, 0-9) from GIF images. The system leverages the multi-frame nature of GIFs to improve accuracy through a voting mechanism.
- GIF Frame Extraction: Decomposes GIF captchas into individual frames
- Voting Mechanism: Aggregates predictions across multiple frames for higher accuracy
- Data Augmentation: Utilizes multiple GIF frames to expand training data
- Efficient CNN: Lightweight convolutional neural network architecture
- Complete Pipeline: Data generation, training, and evaluation scripts
├── generator/ # C# captcha generator
│ ├── GenerateCaptcha.csproj
│ └── Program.cs
├── src/ # Python source code
│ ├── model.py # CNN model definition
│ ├── dataset.py # Dataset loaders
│ ├── train.py # Training script
│ ├── predict.py # Prediction script
│ ├── evaluate.py # Evaluation script
│ └── utils.py # Utility functions
├── data/ # Dataset directory
│ ├── train/ # Training set
│ └── test/ # Test set
├── models/ # Saved models
├── logs/ # TensorBoard logs
└── requirements.txt # Python dependencies
Install PyTorch referred for your system from PyTorch.org. Then install other dependencies:
pip install Pillow numpy tqdm matplotlibEnsure you have .NET 6.0 or later installed.
Generate training and test datasets using the C# generator:
cd generator
dotnet run -- --count 10000 --output ../data/train
dotnet run -- --count 2000 --output ../data/testTrain the captcha recognition model:
cd src
python train.py --epochs 50 --batch-size 64Optional arguments:
--frame-mode: Use each frame as a separate sample (data augmentation)--lr: Learning rate (default: 0.001)--num-workers: Number of data loading workers (default: 4)
Monitor training with TensorBoard:
tensorboard --logdir ../logsEvaluate the trained model on the test set:
python evaluate.py --model ../models/best_model.pth --data-dir ../data/testPredict a single captcha:
python predict.py --model ../models/best_model.pth --image path/to/captcha.gifEvaluate on a directory:
python predict.py --model ../models/best_model.pth --dir ../data/testUse --no-voting to disable the voting mechanism and only use the first frame.
The model recognizes 36 characters:
- Digits: 0-9
- Letters: A-Z (case-insensitive)
For each GIF:
- Extract all frames
- Predict each frame independently
- For each character position, vote on the most common prediction
- Combine voted characters into final result
This approach significantly improves accuracy by leveraging the temporal redundancy in GIF captchas.
Captcha files should be named: LABEL_UUID.gif
- Example:
A3K9_123456789.gif - Label: First 4 characters before underscore
This project is for educational and security research purposes only.
This project utilizes the Hei.Captcha library for captcha generation and PyTorch as the deep learning framework, and was developed with the assistance of LLMs including GitHub Copilot, Claude, and DeepSeek.