Skip to content

Turn messy scans and invoice PDFs into structured JSON. This Flask API + drag-and-drop UI uses Tesseract to extract invoice number, date, amount, and supplier — even from smartphone photos.

Notifications You must be signed in to change notification settings

Talabov/Invoice-OCR-Parser-API-Web-UI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

📄 Invoice OCR Parser API & Web UI


AI-powered REST API + beautiful frontend to extract invoice data (number, date, supplier, amount, and more) from PDFs, scans, or smartphone photos.

Includes: Dockerfile, full project structure, HTML5/CSS frontend, Postman-ready endpoints, and setup guides. All in one ZIP.

👉 Buy it on Gumroad


A modern Flask REST API that turns ugly business paperwork into structured, ready-to-use JSON. Perfect for SaaS, internal tools, freelancers, accountants, integrators, and anyone who’s tired of manual data entry.


✅ Key Features

  • 🖼 Works with PDF, JPG, PNG (scanned, camera, digital, whatever)
  • 🔎 Extracts: invoice number, date, supplier, total (auto-detects in most layouts)
  • 🌍 Multilanguage OCR (tesseract)
  • 🧠 Smart text parsing (handles most weird invoice templates, even messy scans)
  • ⚡ Lightning-fast: avg 1–3 seconds per invoice
  • 🖥 Built-in beautiful HTML/CSS UI (drag & drop, mobile ready)
  • 🚦 API and web front available on a single server — no CORS, no extra configs
  • 🐳 Docker-ready & classic Python scripts
  • 🔒 No cloud/3rd party: runs 100% locally, your docs never leave your PC
  • 🧑‍💻 Easy to customize/extend for your business logic

🚀 API Endpoint

Parse Invoice (OCR)

POST /parse-invoice

Request:

  • multipart/form-data with a file (file=...) — PDF/JPG/PNG
  • (optional) lang — OCR language code (default: "eng")

Example using curl:

curl -X POST -F "file=@invoice.pdf" http://localhost:5000/parse-invoice

Response (200):

{
  "parsed_fields": {
    "invoice_date": "April 15, 2024",
    "invoice_number": "INV-2024-117",
    "supplier_name": "Widget Solutions",
    "total_amount": "$750.00"
  },
  "raw_text": "INVOICE Invoice #\n\nINV-2024-117\nSupplier: Date: April 15, 2024\nWidget Solutions\n123 Industrial Park\nSpringfield, IL 62701\n..."
}

⛔ Error Handling

{"error": "No file part in the request"}
{"error": "Unsupported file type"}
{"error": "Text extraction failed: ..."}

🖥 Frontend Demo

Open http://localhost:5000/ — drag & drop your invoice, get structured results and raw OCR instantly.

  • JSON response — formatted for devs
  • Raw Text — human readable
  • "Copy" and "Download" buttons for instant reuse
  • Works on desktop/mobile, looks clean as hell 😎

⚙️ Requirements

pip install -r requirements.txt
  • Flask
  • pytesseract
  • pdf2image
  • Pillow
  • flask-cors
  • Flask-Limiter
  • flasgger (optional, for API docs)
  • python-magic-bin (Windows) / python-magic (Linux)
  • tesseract-ocr (system dependency!)

🐳 Run with Docker

docker build -t invoice-ocr-api .
docker run -p 5000:5000 invoice-ocr-api

🧑‍💻 Manual Run (dev mode)

python app.py

🧪 Screenshots

  • ✅ API result
  • ✅ Web frontend demo
  • ✅ Error handling
  • ✅ Real OCR with tricky invoices

See /screens/ for live examples and raw data.


💼 Buy & Support

Get the full ZIP: project structure, Dockerfile, API + UI, and all the love:

👉 Buy it on Gumroad


📬 Contacts


Need this in Node.js, Go, or another stack? Custom integration? DM me — I'm ready for business.

About

Turn messy scans and invoice PDFs into structured JSON. This Flask API + drag-and-drop UI uses Tesseract to extract invoice number, date, amount, and supplier — even from smartphone photos.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published