#

vision-language-pretraining

Here are 32 public repositories matching this topic...

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

multimodal any-to-any foundation-models llm vision-language-pretraining

Updated Nov 13, 2024
Python

unitaryai / VTC

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated Nov 3, 2024
Python

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Oct 11, 2024
Jupyter Notebook

Surrey-UP-Lab / RegionSpot

Recognize Any Regions

open-world object-detection zero-shot instance-segmentation auto-labeling vision-language-pretraining open-vocabulary vision-language-model multimodal-representation-learning vision-foundation-model vision-language-foundation-model

Updated Sep 27, 2024
Python

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Aug 27, 2024
Python

HieuPhan33 / CVPR2024_MAVL

Multi-Aspect Vision Language Pretraining - CVPR2024

zero-shot-classification vision-language-pretraining vision-language-model zero-shot-segmentation medical-vision-and-language-pretraining

Updated Aug 20, 2024
Python

mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip image-encoder video-encoder multimodal dual-encoder vision-language vicuna gpt4 vision-language-pretraining llava video-conversation video-chatbot llama3 gpt4o phi-3-mini

Updated Aug 11, 2024
Python

xmed-lab / FD-SOS

MICCAI 2024 Oral: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

object-detection teeth vision-language object-detector open-set-object-detection vision-language-pretraining miccai2024

Updated Jul 30, 2024
Python

BUAADreamer / CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

retrieval wit cross-modal cross-lingual mscoco multi30k image-text-search cross-modal-retrieval xlm-roberta swin-transformer cross-lingual-retrieval image-text-retrieval vision-language-pretraining iglue xflickrco kdd2024

Updated Jul 18, 2024
Python

jaisidhsingh / LoRA-CLIP

Easy wrapper for inserting LoRA layers in CLIP.

lora multimodal multimodal-deep-learning image-text-matching parameter-efficient-tuning vision-language-pretraining low-rank-adaptation

Updated Jun 16, 2024
Python

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

llama large-language-models video-language-pretraining vision-language-pretraining cross-modal-pretraining blip2 minigpt4 multi-modal-chatgpt

Updated Jun 4, 2024
Python

TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

vision-language-pretraining audio-language-pretraining audiovisual-language-pretraining multimodal-representation-learning

Updated May 28, 2024
Python

jusiro / FLAIR

[MedIA'24] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

medical-imaging fundus-image-analysis foundation-models vision-language-pretraining

Updated May 15, 2024
Python

unitaryai / VTC-dataset

dataset video-understanding video-text-retrieval vision-language-pretraining vision-language-dataset

Updated May 1, 2024
Python

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

foundation-models vision-language-pretraining vision-language-model

Updated Apr 24, 2024
Python

YyzHarry / vlm-fairness

Demographic Bias of Vision-Language Foundation Models in Medical Imaging

medical-imaging fairness subpopulation algorithmic-fairness bias-mitigation ood-generalization foundation-models vision-language-pretraining vision-language-model

Updated Feb 23, 2024
Python

omipan / svl_adapter

SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models

self-supervised-learning vision-language-pretraining

Updated Jan 11, 2024
Python

LooperXX / ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

vision-language multi-modal-learning vision-language-pretraining vision-language-learning

Updated Dec 12, 2023
Python

yiren-jian / BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learning vision-language-transformer vision-language-pretraining

Updated Dec 5, 2023
Python

megvii-research / protoclip

📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)

self-supervised-learning contrastive-learning vision-language-pretraining

Updated Nov 8, 2023
Python

Improve this page

Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."