Table OCR

Digitize table scans using the Gemini API.

Quick Start

Prerequisites

Install uv (fast Python package manager):

# Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv

Get a Gemini API Key: https://aistudio.google.com/app/api-keys
- If you stay within these limits, API usage is free.
- To go above these limits, you need to set up billing in Google Cloud (~300$ free credits after initial setup)

Setup & Run (from the project root folder)

# 1. Create virtual environment
uv venv

# 2. Activate it
source .venv/bin/activate            # Linux/macOS
.venv\Scripts\activate               # Windows

# 3. Install dependencies
uv pip install -r requirements.txt

# 4. Set API key & start UI
export GEMINI_API_KEY='your-key'     # Linux/macOS (or set GEMINI_API_KEY=... on Windows)
cd ui && streamlit run app.py

Using the UI

Once running at http://localhost:8501:

Create a Prompt - Instructions and guidance for the LLM
Create a Schema - Define the output columns
Create a Project - Combine prompt + schema
Upload PDFs - Add your documents to the project. All files in a project will use the same prompt/schema
Process - Extract data from tables. Press "View" button of a file to inspect the data extracted from individual files.

Programmatic Usage

If you want to use the functionalities directly in your code instead of the UI:

from table_ocr import ocr_pdf, create_batch_ocr_job
from google import genai

# Define your schema
schema = genai.types.Schema(
    type=genai.types.Type.OBJECT,
    properties={
        "table": genai.types.Schema(
            type=genai.types.Type.ARRAY,
            items=genai.types.Schema(
                type=genai.types.Type.OBJECT,
                properties={
                    "name": genai.types.Schema(type=genai.types.Type.STRING),
                    "date": genai.types.Schema(type=genai.types.Type.STRING),
                }
            )
        )
    }
)

# Direct processing (fast, full cost)
results = ocr_pdf(
    pdf_path="document.pdf",
    prompt_template="Extract the table data",
    response_schema=schema
)

# Batch processing (50% discount, ~24h processing time)
job_name = create_batch_ocr_job(
    pdf_path="document.pdf",
    prompt="Extract the table data",
    response_schema=schema
)

Notes

The default model is Gemini-2.5-Flash-Lite. You can change the used model in config.py. Gemini-2.5-Flash likely delivers better performance at ~5x cost.
Problems can arise when there are remains of the previous/next page on the left/right edge of scanned images. You can try to solve this via prompting, changing the IMAGE_PROCESSING_CONFIG in config.py to automatically crop sides, or manually cropping.
The UI stores data in the ocr_data/ directory at the repository root (created automatically)

Troubleshooting

"streamlit: command not found"

Make sure you've activated your virtual environment:

source .venv/bin/activate  # Linux/macOS
.venv\Scripts\activate     # Windows

"ModuleNotFoundError: No module named 'google'"

Install dependencies:

uv pip install -r requirements.txt

"GEMINI_API_KEY not set"

Set your API key:

export GEMINI_API_KEY='your-key'  # Linux/macOS
set GEMINI_API_KEY=your-key       # Windows

Future Improvements:

Choose which results file is active for each file for the final export.
Majority voting functionality! This can fix most OCR issues.
Set processing config via UI
Allow changing prompt in a project
Enable non-tabular structured data extraction!
Make OCR model interchangeable (other API providers/LiteLLM, or local models such as Marker)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
table_ocr		table_ocr
ui		ui
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
start_ui.sh		start_ui.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table OCR

Quick Start

Prerequisites

Setup & Run (from the project root folder)

Using the UI

Programmatic Usage

Notes

Troubleshooting

"streamlit: command not found"

"ModuleNotFoundError: No module named 'google'"

"GEMINI_API_KEY not set"

Future Improvements:

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table OCR

Quick Start

Prerequisites

Setup & Run (from the project root folder)

Using the UI

Programmatic Usage

Notes

Troubleshooting

"streamlit: command not found"

"ModuleNotFoundError: No module named 'google'"

"GEMINI_API_KEY not set"

Future Improvements:

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages