[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
-
Updated
Jan 23, 2026 - Python
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!
[ICLR 2026] ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
EVE Series: Encoder-Free Vision-Language Models from BAAI
[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Official Repository for PosterGen
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
Video Content Customization Using First Frame
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
[AAAI-2026] Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
This is the code repo for the paper VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models (CVPR 2025).
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
🍑 relsim: Relational Visual Similarity | pip install relsim 🌍
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Add a description, image, and links to the vision-language-models topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-models topic, visit your repo's landing page and select "manage topics."