Stars
Variational Auto-Encoder based on Roberta encoder.
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security I…
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Diffusion Model-Based Image Editing: A Survey (TPAMI 2025)
本项目为 chatgpt-on-wechat下游分支, 额外对接了LLMOps平台 Dify,同时支持gewechat,相比itchat更加稳定。
A high-throughput and memory-efficient inference and serving engine for LLMs
[AAAI 2025] MV-VTON: Multi-View Virtual Try-On with Diffusion Models
A generic triplet data loader for image classification problems,and a triplet loss net demo.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
[CVPR 2023] 3D Cinemagraphy from a Single Image
Code release for the paper "Simulating Fluids in Real-World Still Images"
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Text2Cinemagraph: Text-Guided Synthesis of Eulerian Cinemagraphs [SIGGRAPH ASIA 2023]
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
[ICLR 2025] HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
PyTorch implementation of "TryOnDiffusion: A Tale of Two UNets", a virtual try-on diffusion-based network by Google
Official PyTorch implementation of "RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on" (IJCAI-ECAI 2022)
Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…