TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering #704
Labels
AI-Agents
Autonomous AI agents using LLMs
ai-leaderboards
leaderdoards for llm's and other ml models
data-validation
Validating data structures and formats
dataset
public datasets and embeddings
Models
LLM and ML model repos and links
Papers
Research papers
TITLE: unilm/textdiffuser-2/README.md at master · microsoft/unilm
DESCRIPTION:
"# TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
TextDiffuser-2 exhibits enhanced capability powered by language models. In addition to generating text with remarkable accuracy, TextDiffuser-2 provides plausible text layouts and demonstrates a diverse range of text styles.
🌟 Highlights
We propose TextDiffuser-2 which utilizes two language models for layout planning and layout encoding, increasing the flexibility and diversity in the process of text rendering.
TextDiffuser-2 alleviates several drawbacks in previous methods, such as (1) limited flexibility and automation, (2) constrained capability of layout prediction, and (3) Restricted style diversity.
TextDiffuser-2 is capable of handling text-to-image, text-to-image with template, and text inpainting tasks. Moreover, TextDiffuser-2 introduces an additional feature - it allows for the editing of generated layouts in a conversational manner.
✨ We release the demo at link. Welcome to use and provide feedbacks.
⏱️ News
[2023.12.26]: Code, model, and demo for the text inpainting task are all released. Welcome to play with it at link.
[2023.12.12]: The training and inference code for text-to-image is released. We provide the code for full-parameter training and lora training.
[2023.12.10]: The demo is released at link.
[2023.11.20]: The paper is available at link.
🛠️ Installation
Clone this repo:
Build up a new environment and install packages as follows:
Meanwhile, please install torch, torchvision, xformers that matches the version of the system and cuda version (refer to this link). Please also install flash-attention if you want to train the layout planner using FastChat. We provide the list of packages used in the experiments at link for your reference.
For training the text inpainting task, please install the diffusers package using the command
pip install https://github.com/JingyeChen/diffusers_td2.git
. Note that the U-Net architecture has been modified for receiving more input features.If you encounter an error of RuntimeError: expected scalar type float Float but found Half triggered by diffusers/models/attention_processor.py, please use attention_processor.py to replace the corresponding file in the installed diffusers library.
💾 Checkpoint
We upload the checkpoints to HuggingFace🤗.
Note that we provide the checkpoint with context length 77 as it performs better results when rendering general objects.
📚 Dataset
The data for training the layout planner is at link.
We employ the MARIO-10M dataset for training TextDiffuser-2. Please follow the Dataset section at TextDiffuser to download the dataset, including the train_dataset_index_file.
The train_dataset_index_file should be a .txt file, and each line should indicate an index of a training sample.
🚂 Train
Train layout planner
It is normal that the loss curve seems like a staircase:
Train diffusion model
For full-parameter training:
For LoRA training:
If you encounter an "out-of-memory" error, please consider reducing the batch size appropriately.
🧨 Inference
For full-parameter inference:
For LoRA inference:
🕹️ Demo
TextDiffuser-2 has been deployed on Hugging Face. Welcome to play with it! You can also run
python gradio_demo.py
to use the demo locally.💌 Acknowledgement
We sincerely thank AK and hysts for helping set up the demo. We also feel thankful for the available code/api/demo of SDXL, PixArt, Ideogram, DALLE-3, and GlyphControl.
❗ Disclaimer
Please note that the code is intended for academic and research purposes ONLY. Any use of the code for generating inappropriate content is strictly prohibited. The responsibility for any misuse or inappropriate use of the code lies solely with the users who generated such content, and this code shall not be held liable for any such use.
✉️ Contact
For help or issues using TextDiffuser-2, please email Jingye Chen (qwerty.chen@connect.ust.hk), Yupan Huang (huangyp28@mail2.sysu.edu.cn) or submit a GitHub issue.
For other communications related to TextDiffuser-2, please contact Lei Cui (lecu@microsoft.com) or Furu Wei (fuwei@microsoft.com).
🌿 Citation
If you find TextDiffuser-2 useful in your research, please consider citing:
The text was updated successfully, but these errors were encountered: