Peng Wang* Β· Xiang Liu* Β· Peidong Liu
(* Equal Contribution)
Styl3R predicts stylized 3D Gaussians in less than a second using a feed-forward network given unposed sparse-view images and an arbitrary style image.
π€ Our method could potentially improve the efficiency of domain randomization in sim2real transfer as in
Deep Drone Racing.
- 2025.10.31: The training and inference code is updated. Please inform us if you have encountered any issues.
Our code relies on Python 3.10+, and is developed based on PyTorch 2.2.0 and CUDA 12.1, but it should work with higher Pytorch/CUDA versions as well.
- Clone Styl3R.
git clone https://github.com/WU-CVGL/Styl3R
cd Styl3R- Create the environment, here we show an example using conda.
conda create -y -n styl3r python=3.10
conda activate styl3r
pip install torch==2.2.0 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt- Optional, compile the cuda kernels for RoPE (as in CroCo v2).
# Styl3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd src/model/encoder/backbone/croco/curope/
python setup.py build_ext --inplace
cd ../../../../../..Our models are hosted on Hugging Face π€
| Model name | Training resolutions | Training data | Input view num. |
|---|---|---|---|
| re10k_2v.ckpt | 256x256 | re10k | 2 |
| re10k_4v.ckpt | 256x256 | re10k | 4 |
| re10k_dl3dv_4v.ckpt | 256x256 | re10k, dl3dv | 4 |
Please refer to DATASETS.md for detailed dataset preparation.
After setting up the datasets (RE10K, DL3DV, and WikiArt):
-
Modify the dataset paths marked with
# TODOin
generate_scene_style_correspondences.py.
Then, run the script to generate scene-style correspondence.jsonfiles. -
Update the
rootspaths in: -
Update the
style_rootpaths in:
First download the MASt3R pretrained model and put it in the ./ckpts directory.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m src.main_style +experiment=re10k_3view_style_8x8 \
wandb.mode=online \
wandb.project=noposplat_xiang_token_style_pretrain \
wandb.name=re10k_multi-view_tok-sty-NVS-pretrain \
data_loader.train.batch_size=10 \
dataset.re10k_style.view_sampler.num_context_views=2 \
trainer.val_check_interval=500 \
checkpointing.every_n_train_steps=3125 \
model.encoder.stylized=False \
model.encoder.gaussian_adapter.sh_degree=0Initialize from the pretrained 2-view model by setting checkpointing.load to the corresponding checkpoint path.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m src.main_style +experiment=re10k_3view_style_8x8 \
wandb.mode=online \
wandb.project=noposplat_xiang_token_style_pretrain \
wandb.name=re10k_4multi-view_tok-sty-NVS-pretrain \
data_loader.train.batch_size=6 \
dataset.re10k_style.view_sampler.num_context_views=4 \
dataset.re10k_style.view_sampler.num_target_views=6 \
trainer.val_check_interval=500 \
checkpointing.every_n_train_steps=3125 \
model.encoder.stylized=False \
model.encoder.gaussian_adapter.sh_degree=0 \
checkpointing.load='outputs/exp_re10k_multi-view_tok-sty-NVS-pretrain/2025-04-29_19-47-17/checkpoints/epoch_0-step_15000.ckpt'Initialize from the pretrained 4-view RE10K model.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m src.main_style +experiment=re10k_dl3dv_3view_style_8x8 \
wandb.mode=online \
wandb.project=noposplat_xiang_token_style_pretrain \
wandb.name=re10k_dl3dv_4multi-view_tok-sty-NVS-pretrain \
data_loader.train.batch_size=3 \
dataset.re10k_style.view_sampler.num_context_views=4 \
dataset.re10k_style.view_sampler.num_target_views=6 \
dataset.dl3dv_style.view_sampler.num_context_views=4 \
dataset.dl3dv_style.view_sampler.num_target_views=6 \
trainer.val_check_interval=500 \
checkpointing.every_n_train_steps=3125 \
model.encoder.stylized=False \
model.encoder.gaussian_adapter.sh_degree=0 \
checkpointing.load='outputs/exp_re10k_4multi-view_tok-sty-NVS-pretrain/2025-05-03_14-44-57/checkpoints/epoch_0-step_18750.ckpt'Initialize from the 2-view NVS-pretrained checkpoint.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m src.main_style +experiment=re10k_3view_style_8x8 \
wandb.mode=online \
wandb.project=noposplat_xiang_token_style_debug \
wandb.name=re10k_multi-view_tok-sty-stylization_content-h3-h4 \
data_loader.train.batch_size=14 \
dataset.re10k_style.view_sampler.num_context_views=2 \
trainer.max_steps=35001 \
trainer.val_check_interval=500 \
checkpointing.every_n_train_steps=5000 \
model.encoder.stylized=True \
model.encoder.gaussian_adapter.sh_degree=0 \
loss=style \
train.identity_loss=true \
checkpointing.load='outputs/exp_re10k_multi-view_tok-sty-NVS-pretrain/2025-04-29_19-47-17/checkpoints/epoch_0-step_15000.ckpt'Initialize from the 4-view NVS-pretrained checkpoint.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m src.main_style +experiment=re10k_3view_style_8x8 \
wandb.mode=online \
wandb.project=noposplat_xiang_token_style_debug \
wandb.name=re10k_4multi-view_tok-sty-stylization \
data_loader.train.batch_size=8 \
dataset.re10k_style.view_sampler.num_context_views=4 \
dataset.re10k_style.view_sampler.num_target_views=6 \
trainer.max_steps=35001 \
trainer.val_check_interval=500 \
checkpointing.every_n_train_steps=5000 \
model.encoder.stylized=True \
model.encoder.gaussian_adapter.sh_degree=0 \
loss=style \
train.identity_loss=true \
checkpointing.load='outputs/exp_re10k_4multi-view_tok-sty-NVS-pretrain/2025-05-03_14-44-57/checkpoints/epoch_0-step_18750.ckpt'Initialize from the 4-view NVS-pretrained checkpoint on RE10K + DL3DV.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m src.main_style +experiment=re10k_dl3dv_3view_style_8x8 \
wandb.mode=online \
wandb.project=noposplat_xiang_token_style_debug \
wandb.name=re10k_dl3dv_4multi-view_tok-sty-stylization \
data_loader.train.batch_size=3 \
dataset.re10k_style.view_sampler.num_context_views=4 \
dataset.re10k_style.view_sampler.num_target_views=6 \
dataset.dl3dv_style.view_sampler.num_context_views=4 \
dataset.dl3dv_style.view_sampler.num_target_views=6 \
trainer.max_steps=35001 \
trainer.val_check_interval=500 \
checkpointing.every_n_train_steps=5000 \
model.encoder.stylized=True \
model.encoder.gaussian_adapter.sh_degree=0 \
loss=style \
train.identity_loss=true \
checkpointing.load='outputs/exp_re10k_dl3dv_4multi-view_tok-sty-NVS-pretrain/2025-05-05_21-11-11/checkpoints/epoch_0-step_15625.ckpt'Note: Set
checkpointing.loadto a checkpoint obtained after stylization fine-tuning.
CUDA_VISIBLE_DEVICES=0 python -m infer_model_re10k \
+experiment=re10k_3view_style_1x.yaml \
wandb.name=re10k_tok-sty_inference \
model.encoder.stylized=True \
model.encoder.gaussian_adapter.sh_degree=0 \
test.pose_align_steps=50 \
checkpointing.load='outputs/exp_re10k_dl3dv_4multi-view_tok-sty-stylization_b2x3/2025-05-06_19-31-34/checkpoints/epoch_0-step_35000.ckpt'infer_re10k.mp4
CUDA_VISIBLE_DEVICES=0 python -m infer_model_colmap \
+experiment=re10k_3view_style_1x.yaml \
wandb.name=video_colmap_tok-sty_inference \
model.encoder.stylized=True \
model.encoder.gaussian_adapter.sh_degree=0 \
test.pose_align_steps=50 \
checkpointing.load='outputs/exp_re10k_dl3dv_4multi-view_tok-sty-stylization_b2x3/2025-05-06_19-31-34/checkpoints/epoch_0-step_35000.ckpt'infer_tnt.mp4
A launch.json configuration is provided for debugging all training and inference commands in VSCode.
- Update the
pythonfield to your Python interpreter path. - Update
checkpointing.loadto your corresponding checkpoint path.
- Release inference code and pretrained models
- Release gradio demo
- Release training code
This project is based on NoPoSplat. We sincerely thank the authors for open-sourcing their excellent work.
If you find our work useful, please consider citing our paper:
@article{wang2025styl3r,
title={Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles},
author={Wang, Peng and Liu, Xiang and Liu, Peidong},
journal={arXiv preprint arXiv:2505.21060},
year={2025}
}