High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
-
Updated
Nov 29, 2025 - Python
High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
The ultimate sketch to code app made using GPT4o serving 30k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
ParkingGPT is a cross-platform app that enables you to decide whether you want to park or not, all using the power of multimodal and multilanguage Vision AI and LLM.
Generate LEGO like looking images with gpt4-vision and DALL·E 3.
Vision-Assisted Camera Orientation
🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Qwen / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS/Plugins/Artifacts). One-click FREE deployment of your private ChatGPT/ Claude application
DRIFT.AI V2 - AI-Powered Contract Reconciliation Platform for Healthcare | Next.js 15 + TypeScript + GPT-4 Vision
AI-powered tool to blend 3–5 images into a single composite using GPT-4 Vision and DALL·E 3. Modular, scalable, and built with Flask.
Add a description, image, and links to the gpt4-vision topic page so that developers can more easily learn about it.
To associate your repository with the gpt4-vision topic, visit your repo's landing page and select "manage topics."