vision-language-learning

Here are 15 public repositories matching this topic...

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Feb 11, 2026
Python

RLHF-V / RLAIF-V

Star

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

chatbot multimodal llava vision-language-learning gpt-4v llava-next rlaif-v minicpm-v cvpr2025

Updated May 14, 2025
Python

shikiw / OPERA

Star

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

chatbot llama multimodal gpt-4 chatgpt vision-language-model vision-language-learning large-multimodal-models

Updated Aug 24, 2024
Python

shikiw / Modality-Integration-Rate

Star

[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

chatbot llama multimodal vision-language-model llava vision-language-learning large-multimodal-models gpt-4o

Updated Jul 9, 2025
Python

YunzeMan / Situation3D

Star

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

deep-learning multimodal-learning multi-modal-learning 3d-scene-understanding vision-language-model vision-language-learning

Updated Dec 9, 2024
Python

LooperXX / ManagerTower

Star

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

vision-language multi-modal-learning vision-language-pretraining vision-language-learning

Updated Aug 23, 2025
Python

yubin1219 / CrossVLT

Star

Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)

pytorch referring-image-segmentation vision-language-learning

Updated Aug 14, 2024
Python

SHTUPLUS / GITM-MR

Star

The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".

vision-and-language vision-and-language-pre-training vision-language-dataset vision-language-model vision-language-learning

Updated Dec 8, 2023
Python

lyuchenyang / Dialogue-to-Video-Retrieval

Star

Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"

machine-learning deep-learning multimedia neural-networks video-retrieval vision-language-learning

Updated Jul 14, 2023
Python

fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Star

Streamlit App Combining Vision, Language, and Audio AI Models

conversational-interface conversational-ai multimodal-learning multimodal multimodal-deep-learning multimodal-data conversational-agent conversational-bot vision-language vision-language-transformer generative-ai vision-language-model vision-language-navigation multimodal-large-language-models vision-language-learning vision-language-models internvl internvl2

Updated Jan 27, 2025
Python

neil-ab / socratic-models

Star

Socratic models for multimodal reasoning & image captioning

image-captioning clip multimodal-learning visual-question-answering gpt-3 chain-of-thought flan-t5 vision-language-learning

Updated Jun 4, 2023
Jupyter Notebook

fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Star

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

story-generation multimodal-learning multimodal multimodal-deep-learning multimodal-data vision-language vision-language-transformer generative-ai vision-language-model multimodal-large-language-models vision-language-learning generative-ai-model agentic-workflow agentic-rag agentic-ai internvl2

Updated Jan 29, 2025
Python

Ravi-Teja-konda / TunedLlavaDelights

Star

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

dessert nutrition nutrition-information finetuning multimodal multi-modality gpt4 tranformers dalle2 stable-diffusion chatgpt vision-language-model llava vision-language-learning llama2 gpt4v

Updated Mar 17, 2024
Python

jhakrraman / Deep-Learning-NYU-Spring-2025

Star

This reporsitory contains all the Homeworks, and Projects from the Deep Learning Course by Prof. Chinmay Hegde, in Spring 2025, at NYU.

natural-language-processing reinforcement-learning computer-vision deep-learning deep-learning-algorithms reinforcement-learning-algorithms new-york-university nyu natural-language-understanding computer-vision-algorithms deep-learning-models deep-learning-projects computer-vision-projects vision-language-transformer llms vision-language-model vision-language-learning vision-language-models

Updated May 29, 2025

MilchstraB / LVLM-OnPolicy

Star

The official code of paper "Beyond Reward Models: A Classification Approach for Hallucination Mitigation of LVLMs"

multimodal dpo llava vision-language-learning

Updated Apr 23, 2025
Python

Improve this page

Add a description, image, and links to the vision-language-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-learning

Here are 15 public repositories matching this topic...

AIDC-AI / Ovis

RLHF-V / RLAIF-V

shikiw / OPERA

shikiw / Modality-Integration-Rate

YunzeMan / Situation3D

LooperXX / ManagerTower

yubin1219 / CrossVLT

SHTUPLUS / GITM-MR

lyuchenyang / Dialogue-to-Video-Retrieval

fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

neil-ab / socratic-models

fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Ravi-Teja-konda / TunedLlavaDelights

jhakrraman / Deep-Learning-NYU-Spring-2025

MilchstraB / LVLM-OnPolicy

Improve this page

Add this topic to your repo