Yuming Gu1,2
·
Phong Tran2
·
Yujian Zheng2
·
Hongyi Xu3
·
Heyuan Li4
·
Adilbek Karmanov2
·
Hao Li2,5
1Unviersity of Southern California 2MBZUAI 3ByteDance Inc.
4The Chinese University of Hong Kong, Shenzhen 5Pinscreen Inc.
Teaser_video.mp4
- Project Page
- Paper
- Inference code
- Checkpoints of Diffportrait360
- Checkpoints of Back-View Generation Module
- Training code
- Internet collected inference Data (self-collected from Pexels and 1000 extra real image portraits)
- Gradio Demo
This is the official pytorch implementation of Diffportrait360, which generates 360 degree human head through single image portrait.
Generating high-quality 360-degree views of human heads from single-view images is essential for enabling accessible immersive telepresence applications and scalable personalized content creation. While cutting-edge methods for full head generation are limited to modeling realistic human heads, the latest diffusion-based approaches for style-omniscient head synthesis can produce only frontal views and struggle with view consistency, preventing their conversion into true 3D models for rendering from arbitrary angles. We introduce a novel approach that generates fully consistent 360-degree head views, accommodating human, stylized, and anthropomorphic forms, including accessories like glasses and hats. Our method builds on the DiffPortrait3D framework, incorporating a custom ControlNet for back-of-head detail generation and a dual appearance module to ensure global front-back consistency. By training on continuous view sequences and integrating a back reference image, our approach achieves robust, locally continuous view synthesis. Our model can be used to produce high-quality neural radiance fields (NeRFs) for real-time, free-viewpoint rendering, outperforming state-of-the-art methods in object synthesis and 360-degree head generation for very challenging input portraits.
We employs a frozen pre-trained Latent Diffusion Model (LDM) as a rendering backbone and incorporates three auxiliary trainable modules for disentangled control of dual appearance R, camera control C, and U-Nets with view consistency {V}. Specifically, {R} extracts appearance information from {ref} and {back}}, and {C} derives the camera pose, which is rendered using an off-the-shelf 3D GAN. During training, we utilize a continuous sampling training strategy to better preserve the continuity of the camera trajectory. We enhance attention to continuity between frames to maintain the appearance information without changes due to turning angles. For inference, we employ our tailored back-view image generation network {F} to generate a back-view image, enabling us to generate a 360-degree full range of camera trajectories using a single image portrait. Note that
To evaluate the dynamics texture generation performance of X-Dyna in human video animation, we compare the generation results of Diffportrait360 with PanoHead, SphereHead, Unique3D.
Comparsion1.mp4
dual.mp4
Ablation_seq.mp4
- An NVIDIA GPU with CUDA support is required.
- We have tested on a single A6000 GPU.
- In our experiment, we used CUDA 12.2.
- Minimum: The minimum GPU memory required is 30GB for generating a single NVS flat video of 32 frames.
- Recommended: We recommend using a GPU with 80GB of memory.
- Operating system: Linux
Clone the repository:
git clone https://github.com/FreedomGu/Diffportrait360
cd diffportrait360_release
We provide an env.yml
file for setting up the environment.
Run the following command on your terminal:
conda env create -n diffportrait360 python=3.9
conda activate diffportrait360
pip install -r requirements.txt
Download the models through this HF link.
Change following three model paths (PANO_HEAD_MODEL, Head_Back_MODEL, Diff360_MODEL) to your own download path at inference.sh
.
We provide some examples of preprocessed portrait images thought this script. If you would like to try on your own data, please put your own data under /input_image folder and get dataset.json
(camera information under panohead coordinate system) follow with this folder.
cd diffpotrait360_release/code
bash inference.sh
Make sure you get the dataset.json
file and cropped image under /input_image folder.
Run bash inference.sh
If you find Diffportrait360 is useful for your research and applications, please cite Diffportrait360 using this BibTeX:
@article{gu2025diffportrait360,
title={DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis},
author={Gu, Yuming and Tran, Phong and Zheng, Yujian and Xu, Hongyi and Li, Heyuan and Karmanov, Adilbek and Li, Hao},
journal={arXiv preprint arXiv:2503.15667},
year={2025}
}
Our code is distributed under the Apache-2.0 license.
This work is supported by the Metaverse Center Grant from the MBZUAI Research Office. We appreciate the contributions from Diffportrait3D, PanoHead, SphereHead, ControlNet for their open-sourced research. We thank Egor Zakharov, Zhenhui Lin, Maksat Kengeskanov, and Yiming Chen for the early discussions, helpful suggestions, and feedback.
Please contact yuminggu@usc.edu if there has been any misuse of images, and we will promptly remove them.