Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
-
Updated
May 20, 2025 - Python
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
An on-premises, OCR-free unstructured data extraction and benchmarking toolkit. (https://idp-leaderboard.org/)
Official repository for VisionZip (CVPR 2025)
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Scala client for OpenAI API and other major LLM providers
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.
BobVLM – A 1.5B multimodal model built from scratch and pre-trained on a single P100 GPU capable of image descriptions and moderate question answering. 🤗🎉
Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"
Code for VLM4Bio, a benchmark dataset of scientific question-answer pairs used to evaluate pretrained VLMs for trait discovery from biological images.
Add a description, image, and links to the vlms topic page so that developers can more easily learn about it.
To associate your repository with the vlms topic, visit your repo's landing page and select "manage topics."