This repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).
-
Updated
Dec 4, 2025 - Jupyter Notebook
This repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
This repository contains a notebook to demonstrate the power of Document Text Recognition (DocTR) library
Integrating object detection with YOLO11 and Optical Character Recognition (OCR) using Tesseract.
Official repository for the paper “A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research” (ICMIE 2025). Includes Colab notebooks, JSON evaluation data, and reproducibility materials.
Add a description, image, and links to the ocr-recognition topic page so that developers can more easily learn about it.
To associate your repository with the ocr-recognition topic, visit your repo's landing page and select "manage topics."