[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
-
Updated
Apr 4, 2026 - Python
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!
[ICLR 2026] ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
ParseBench - A Document Parsing Benchmark for AI Agents
EVE Series: Encoder-Free Vision-Language Models from BAAI
[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Official Repository for PosterGen - CVPR Findings 2026
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
Video Content Customization Using First Frame
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
[NeurIPS 2024 Spotlight βοΈ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
[AAAI-2026] Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
This repository is the official implementation of our paper (From reactive to cognitive: brain-inspired spatial intelligence for embodied agents)
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
[ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
This is the code repo for the paper VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models (CVPR 2025).
π relsim: Relational Visual Similarity | pip install relsim π (CVPR 2026)
Add a description, image, and links to the vision-language-models topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-models topic, visit your repo's landing page and select "manage topics."