Monitor the performance of OpenAI's GPT-4V model over time.
-
Updated
Nov 15, 2024 - HTML
Monitor the performance of OpenAI's GPT-4V model over time.
This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.
Vision utilities for web interaction agents 👀
Control Any Computer Using LLMs
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Video Voiceover with gpt-4o-mini
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Mark web pages for use with vision-language models
[READ-ONLY] Describe images and generate alt tags for visually impaired users.
GPT-4V in Wonderland: LMMs as Smartphone Agents
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
Vision-Assisted Camera Orientation
Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
Add a description, image, and links to the gpt4v topic page so that developers can more easily learn about it.
To associate your repository with the gpt4v topic, visit your repo's landing page and select "manage topics."