#

multimodality

Here are 171 public repositories matching this topic...

lucidrains / big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

deep-learning artificial-intelligence multimodality generative-adversarial-networks text-to-image

Updated Feb 6, 2022
Python

BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

hymie122 / RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

survey multimodality rag diffusion-models aigc llm

Updated Aug 20, 2024

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Aug 21, 2025
Python

PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems

collaborative-filtering matrix-factorization recommendation-system recommendation-engine recommender-system recommendation-algorithms multimodality multimodal-learning

Updated Apr 26, 2025
Python

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

search retrieval ranking clip multimodality multimodal-learning multimodal activitynet retrieval-model msvd msrvtt video-text-retrieval lsmdc didemo video-clip-retrieval

Updated Apr 12, 2024
Python

fnzhan / Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

gans multimodality diffusion-model nerfs aigc

Updated Nov 21, 2023
TeX

FEDOT

aimclub / FEDOT

Automated modeling and machine learning framework FEDOT

machine-learning automation genetic-programming hyperparameter-optimization evolutionary-algorithms multimodality automl automated-machine-learning parameter-tuning structural-learning fedot

Updated Aug 11, 2025
Python

VITA-MLLM / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Dec 23, 2024
Python

jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

computer-vision gpt roi multimodality llm

Updated Jun 3, 2025
Python

LLM2CLIP

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip multimodality fundation-models

Updated Jul 1, 2025
Python

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models llm text-to-3d multimodal-generation mllm text-to-sound large-vision-language-models multimodal-large-language-models lvlm

Updated Apr 4, 2025
HTML

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 19, 2025
Python

zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language x-vlm

Updated Nov 25, 2022
Python

afiaka87 / clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

deep-learning artificial-intelligence openai image-generation multimodality text-to-image diffusion multimodal text-to-image-synthesis openai-clip

Updated Feb 8, 2022
Python

kyegomez / Med-PaLM

Towards Generalist Biomedical AI

opensource deep-learning multimodality biomedical multimodal multimodal-deep-learning gpt4

Updated Feb 17, 2024
Python

fonduer

HazyResearch / fonduer

A knowledge base construction engine for richly formatted data

machine-learning multimodality knowledge-base-construction

Updated Jun 23, 2021
Python

lium-lst / nmtpytorch

Sequence-to-Sequence Framework in PyTorch

deep-learning cnn pytorch speech-recognition seq2seq neural-machine-translation nmt multimodality asr

Updated Jan 5, 2023
Jupyter Notebook

OmicsML / dance

DANCE: a deep learning library and benchmark platform for single-cell analysis

python data-science benchmark machine-learning bioinformatics deep-learning computational-biology dance single-cell multimodality single-cell-rna-seq graph-neural-networks spatial-transcriptomics single-cell-rna-sequencing

Updated Aug 20, 2025
Python

UCSC-VLAA / MedTrinity-25M

[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“

dataset multimodality mllms

Updated Jul 11, 2025
Python

Improve this page

Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."