Janus-Series: Unified Multimodal Understanding and Generation Models
-
Updated
Nov 13, 2024 - Python
Janus-Series: Unified Multimodal Understanding and Generation Models
VTC: Improving Video-Text Retrieval with User Comments
LAVIS - A One-stop Library for Language-Vision Intelligence
Recognize Any Regions
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Multi-Aspect Vision Language Pretraining - CVPR2024
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
MICCAI 2024 Oral: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Easy wrapper for inserting LoRA layers in CLIP.
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
[MedIA'24] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Demographic Bias of Vision-Language Foundation Models in Medical Imaging
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."