Rui Zhao
·
Weijia Mao
·
Mike Zheng Shou
Show Lab, National University of Singapore
This repository provides the official training and inference code for DoraCycle.
- Base: Show-o (512×512, MAGVITv2, Phi-1.5).
- Install:
pip install -r requirements.txt; training usesaccelerateand DeepSpeed. - Optional:
wandb loginfor logging.
- MAGVITv2: showlab/magvitv2
- Show-o 512×512: showlab/show-o-512x512
- Phi-1.5: microsoft/phi-1_5
Set these (or local paths) in your config.
Single training entrypoint: training/train_doracycle.py with configs/doracycle.yaml.
- Edit
configs/doracycle.yaml: setexperiment.cus_data_pathto your unpaired image + text data path (seetraining/cus_data_parque.py/ CusDataset for format). Setexperiment.characters_namesand, if needed,dataset.params.train_lm_shards_path_or_url. - Run:
accelerate launch --config_file accelerate_configs/8_gpu_deepspeed_zero2.yaml --main_process_port=8888 \
training/train_doracycle.py config=configs/doracycle.yamlCheckpoints are saved under experiment.output_dir. Use resume_from_checkpoint: 'latest' to resume.
DoraCycle checkpoints are saved under output_dir/checkpoint-XXXX/ with unwrapped_model/ and unwrapped_model_ema/. Set model.showo.pretrained_model_path in config to the checkpoint root (e.g. .../checkpoint-XXXX).
Text-to-image: inference_t2i_lora.py
python3 inference_t2i_lora.py config=configs/doracycle.yaml \
validation_prompts_file=validation_prompts/validation_prompts.txtMultimodal understanding: inference_mmu_lora.py
python3 inference_mmu_lora.py config=configs/doracycle.yaml \
mmu_image_root=./mmu_validation question='Please describe this image in details'Alternative — load full checkpoint
Use inference_t2i.py / inference_mmu.py and set pretrained_model_path to the model subfolder (e.g. .../checkpoint-XXXX/unwrapped_model).
@article{zhao2025doracycle,
title={DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles},
author={Zhao, Rui and Mao, Weijia and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2503.03651},
year={2025}
}
@article{xie2024showo,
title={Show-o: One Single Transformer to Unify Multimodal Understanding and Generation},
author={Xie, Jinheng and Mao, Weijia and Bai, Zechen and Zhang, David Junhao and others},
journal={arXiv preprint arXiv:2408.12528},
year={2024}
}