High-performance Optical Character Recognition (OCR) API built on top of the SCB10X Typhoon OCR 7B model, powered by FastAPI and Transformers.
Designed for scalable deployment and optimized for GPU acceleration and low-latency inference.
- 🔍 General and custom prompt-based OCR
- 🧠 Powered by
Typhoon OCR 7B(Multimodal vision-language model) - ⚡ GPU optimization: TF32, cuDNN tuning, memory handling
- 📷 Efficient image resizing and format conversion
- 📤 Upload image via REST API with prompt or message support
- 📦 Fully containerizable for deployment
git clone https://github.com/your-org/typhoon-ocr-api.git
cd typhoon-ocr-apiMake sure Python 3.10+ is installed.
pip install -r requirements.txt✅ For GPU support, ensure that you have CUDA and the proper torch version installed. Example:
pip install torch==2.1.0+cu118 -f https://download.pytorch.org/whl/torch_stable.html
uvicorn main:app --host 0.0.0.0 --port 8000Then open your browser at: 📍 http://localhost:8000/docs
OCR with a general prompt (default or custom string)
file: image file (jpg, png, etc.)prompt: optional, custom instruction (default: "Extract all text from this image")
curl -X POST http://localhost:8000/ocr/general \
-F "file=@./sample.jpg" \
-F "prompt=Extract the ID number and name from the ID card"OCR with a structured prompt using role-based messages.
file: image filemessages: JSON list of chat-style messages
[
{
"role": "user",
"content": "Extract text in table format from the document"
}
]Simple health check endpoint. Returns "ok" if the model and processor are loaded correctly.
- Python >= 3.10
- CUDA-enabled GPU (optional but recommended)
- Pytorch >= 2.1.0
- Transformers >= 4.40.0
Application logs are printed to stdout using Python logging.
Includes:
- Model load time
- GPU availability and memory
- OCR process time
- This API loads the
Typhoon OCR 7Bmodel and performs warm-up during startup. - GPU optimization includes TF32 acceleration, cuDNN tuning, and
torch.compilefor inference. - You can extend the API to support multiple models, batch processing, or file exports easily.
Sitthichai S. (2025)
MIT License © 2025