Stars
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
Nightly release of ControlNet 1.1
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Official implementation of ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining (AAAI 2024)
Large-scale text-video dataset. 10 million captioned short videos.
Generative Models by Stability AI
Official PyTorch implementation of the CVPR 2022 paper: "Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator"
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten ge…
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
This repository is the implementation of "Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context".
Search image collections by multiple color palettes or by image color similarity.
Implementation of DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Official implementation of "Composer: Creative and Controllable Image Synthesis with Composable Conditions"
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Text-To-Image Generation with Chinese Characters
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
A latent text-to-image diffusion model