A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
-
Updated
Feb 6, 2022 - Python
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
A Comparative Framework for Multimodal Recommender Systems
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Automated modeling and machine learning framework FEDOT
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Towards Generalist Biomedical AI
A knowledge base construction engine for richly formatted data
Sequence-to-Sequence Framework in PyTorch
DANCE: a deep learning library and benchmark platform for single-cell analysis
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.
To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."