multimodal-deep-learning

Here are 296 public repositories matching this topic...

KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

machine-learning deep-learning time-series language-model time-series-analysis time-series-forecast time-series-forecasting multimodal-deep-learning cross-modality multimodal-time-series cross-modal-learning prompt-tuning large-language-models

Updated Nov 3, 2024
Python

kyegomez / BitNet

Sponsor

Star

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

machine-learning deep-neural-networks artificial-intelligence deeplearning multimodal multimodal-deep-learning gpt4

Updated Sep 8, 2025
Python

jrzaurin / pytorch-widedeep

Star

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

python deep-learning text images tabular-data pytorch pytorch-cv multimodal-deep-learning pytorch-nlp pytorch-transformers model-hub pytorch-tabular-data

Updated Jul 30, 2025
Python

MMMU-Benchmark / MMMU

Star

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 19, 2025
Python

remyxai / VQASynth

Star

Compose multimodal datasets 🎹

dataset-generation spatial-reasoning synthetic-dataset-generation multimodal-deep-learning multimodal-datasets scene-reconstruction

Updated Aug 8, 2025
Python

kyegomez / Med-PaLM

Sponsor

Star

Towards Generalist Biomedical AI

opensource deep-learning multimodality biomedical multimodal multimodal-deep-learning gpt4

Updated Feb 17, 2024
Python

sail-sg / CLoT

Star

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

association multimodal-deep-learning humor-generation large-language-models leap-of-thought

Updated Apr 13, 2024
Python

MILVLG / prophet

Star

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

pytorch visual-question-answering multimodal-deep-learning gpt-3 prompt-engineering okvqa a-okvqa

Updated Jun 14, 2025
Python

YuanGongND / cav-mae

Star

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

audio computer-vision audio-processing multimodal multimodal-deep-learning

Updated Mar 20, 2024
Python

kyegomez / NaViT

Sponsor

Star

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

vit attention-mechanism clip multimodality multimodal-learning multimodal multimodal-deep-learning gpt4

Updated Sep 8, 2025
Python

drprojects / DeepViewAgg

Star

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

image deep-learning point-cloud pytorch attention semantic-segmentation cvpr point-cloud-segmentation multimodal multimodal-deep-learning multi-view pytorch-geometric s3dis torch-points3d kitti-360 cvpr2022

Updated Aug 7, 2024
Python

yuanze-lin / Learnable_Regions

Star

[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"

generative-model diffusion-model multimodal-deep-learning diffusion-models text-image aigc text-driven-editing text-driven-image-editing text-driven-image-manipulation text-driven-manipulation

Updated Sep 28, 2024
Python

IDEA-Research / ChatRex

Star

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

detection vlm multimodal-deep-learning llm mllm

Updated Jan 24, 2025
Python

DavidHuji / CapDec

Star

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

clip zero-shot-learning captioning multimodal-deep-learning gpt-2 clipcap

Updated Jan 28, 2024
Python

florencejt / fusilli

Star

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

machine-learning cnn pytorch attention-mechanism imaging multimodality multivariate-analysis variational-autoencoder data-fusion multimodal multimodal-deep-learning multi-view-learning multi-view graph-neural-network pytorch-lightning

Updated Jul 23, 2025
Python

declare-lab / Multimodal-Infomax

Star

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

multimodal-sentiment-analysis multimodal-deep-learning multimodal-fusion

Updated Mar 14, 2023
Python

thuiar / MMSA-FET

Star

A Tool for extracting multimodal features from videos.

multimodal-sentiment-analysis multimodal-deep-learning

Updated Feb 11, 2023
Python

vijayvee / video-captioning

Star

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

westlake-repl / IDvs.MoRec

Star

End-to-end Training for Multimodal Recommendation Systems

end-to-end multimodal multimodal-deep-learning image-recommendation foundation-models llm large-language-model foundation-recommendation-model text-recommendation transferable-recommendation multimodal-recommendation multimodal-recommendation-dataset llm-recommendation modality-based-recommendation

Updated Feb 2, 2025
Python

LeapLabTHU / Pseudo-Q

Star

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

computer-vision deep-learning pytorch vision-and-language multimodal-deep-learning visual-grounding cvpr2022

Updated Jul 13, 2024
Python

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-deep-learning

Here are 296 public repositories matching this topic...

KimMeen / Time-LLM

kyegomez / BitNet

jrzaurin / pytorch-widedeep

MMMU-Benchmark / MMMU

remyxai / VQASynth

kyegomez / Med-PaLM

sail-sg / CLoT

MILVLG / prophet

YuanGongND / cav-mae

kyegomez / NaViT

drprojects / DeepViewAgg

yuanze-lin / Learnable_Regions

IDEA-Research / ChatRex

DavidHuji / CapDec

florencejt / fusilli

declare-lab / Multimodal-Infomax

thuiar / MMSA-FET

vijayvee / video-captioning

westlake-repl / IDvs.MoRec

LeapLabTHU / Pseudo-Q

Improve this page

Add this topic to your repo