Stars
EDM2 and Autoguidance -- Official PyTorch implementation
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
Collection of common code that's shared among different research projects in FAIR computer vision team.
Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."
A paper list of some recent works about Token Compress for Vit and VLM
A suite of image and video neural tokenizers
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
XQ-GAN🚀: An Open-source Image Tokenization Framework for Autoregressive Generation
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[Official Implementation] Acoustic Autoregressive Modeling 🔥
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
A framework for few-shot evaluation of language models.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
This is the official implementation for ControlVAR.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
This is a repo to track the latest autoregressive visual generation papers.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"