⚡ Typhoon OCR API (Optimized)

High-performance Optical Character Recognition (OCR) API built on top of the SCB10X Typhoon OCR 7B model, powered by FastAPI and Transformers.

Designed for scalable deployment and optimized for GPU acceleration and low-latency inference.

🚀 Features

🔍 General and custom prompt-based OCR
🧠 Powered by Typhoon OCR 7B (Multimodal vision-language model)
⚡ GPU optimization: TF32, cuDNN tuning, memory handling
📷 Efficient image resizing and format conversion
📤 Upload image via REST API with prompt or message support
📦 Fully containerizable for deployment

📦 Installation

1. Clone Repository

git clone https://github.com/your-org/typhoon-ocr-api.git
cd typhoon-ocr-api

2. Install Dependencies

Make sure Python 3.10+ is installed.

pip install -r requirements.txt

✅ For GPU support, ensure that you have CUDA and the proper torch version installed. Example:

pip install torch==2.1.0+cu118 -f https://download.pytorch.org/whl/torch_stable.html

🧪 Usage

Run the server

uvicorn main:app --host 0.0.0.0 --port 8000

Then open your browser at: 📍 http://localhost:8000/docs

📤 API Endpoints

`POST /ocr/general`

OCR with a general prompt (default or custom string)

Request:

file: image file (jpg, png, etc.)
prompt: optional, custom instruction (default: "Extract all text from this image")

Example:

curl -X POST http://localhost:8000/ocr/general \
  -F "file=@./sample.jpg" \
  -F "prompt=Extract the ID number and name from the ID card"

`POST /ocr/custom`

OCR with a structured prompt using role-based messages.

Request:

file: image file
messages: JSON list of chat-style messages

Example Body (in Swagger or Postman):

[
  {
    "role": "user",
    "content": "Extract text in table format from the document"
  }
]

`GET /healthz`

Simple health check endpoint. Returns "ok" if the model and processor are loaded correctly.

⚙️ Environment Requirements

Python >= 3.10
CUDA-enabled GPU (optional but recommended)
Pytorch >= 2.1.0
Transformers >= 4.40.0

📊 Logging

Application logs are printed to stdout using Python logging. Includes:

Model load time
GPU availability and memory
OCR process time

📌 Notes

This API loads the Typhoon OCR 7B model and performs warm-up during startup.
GPU optimization includes TF32 acceleration, cuDNN tuning, and torch.compile for inference.
You can extend the API to support multiple models, batch processing, or file exports easily.

🧑‍💻 Author

Sitthichai S. (2025)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ Typhoon OCR API (Optimized)

🚀 Features

📦 Installation

1. Clone Repository

2. Install Dependencies

🧪 Usage

Run the server

📤 API Endpoints

`POST /ocr/general`

Request:

Example:

`POST /ocr/custom`

Request:

Example Body (in Swagger or Postman):

`GET /healthz`

⚙️ Environment Requirements

📊 Logging

📌 Notes

🧑‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

RikuAlice01/typhoon-ocr

Folders and files

Latest commit

History

Repository files navigation

⚡ Typhoon OCR API (Optimized)

🚀 Features

📦 Installation

1. Clone Repository

2. Install Dependencies

🧪 Usage

Run the server

📤 API Endpoints

POST /ocr/general

Request:

Example:

POST /ocr/custom

Request:

Example Body (in Swagger or Postman):

GET /healthz

⚙️ Environment Requirements

📊 Logging

📌 Notes

🧑‍💻 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`POST /ocr/general`

`POST /ocr/custom`

`GET /healthz`

Packages