Pytorch implementation of Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He*, Shaofei Huang*, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu
- [2024/3/12] Code released.
- [2023/12/4] Paper is available here.
In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency given a single-view reference image.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing, aimed at foreground-only manipulation while preserving the background.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem among different views in image-driven editing. Extensive experiments show that our CustomNeRF produces precise editing results under various real scenes for both text- and image-driven settings.
pip install -r requirements.txt
# install the tcnn backbone
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
# Build extension
bash scripts/install_ext.sh
You can use a variety of popular NeRF datasets, as well as a set of photos taken with your own camera. We recommend using NeRFstudio for extracting camera poses.
For convenience, here we provide data used in our paper, you can download the dataset with extracted masks in google driven here and use it directly.
- data1.zip: data in a non-llff format
- data2_llff.zip: data in llff format
You can process the downloaded data like this:
mkdir data
mv data1.zip data
mv data2_llff.zip data
cd data
unzip data1.zip
unzip data2_llff.zip
cd ..
data_path="./data/bear"
python main.py -O2 \
--workspace "./outputs/bear/base" --iters 3000 \
--backbone grid --bound 2 --train_resolution_level 7 --eval_resolution_level 4 \
--data_type "nerfstudio" --data_path $data_path \
--keyword 'bear' --train_conf 0.01 --soft_mask \
- --data_type: If you use data in data1.zip, i.e. not in llff format, specify it as 'nerfstudio', otherwise set it to 'llff'
- --bound: We've found that setting it to 2 works best in most cases, so if you have a poor NeRF quality reconstruction, try changing it
If you want to render NeRF that has already been rebuilt, please append the following command to start test mode:
--test --eval_resolution_level 3
(Optinal) Only needed in image-driven NeRF editing. If your want to try text-driven NeRF editing, please skip this step and go directly to Step3 below.
We've integrated CustomDiffusion into this repository, so you can check the custom_diffusion/tuning.sh
file, replace the necessary information and perform fine-tuning by:
bash custom_diffusion/tuning.sh
data_path="./data/bear"
python main.py -O2 \
--workspace "./outputs/bear/base" --iters 3000 \
--backbone grid --bound 2 --train_resolution_level 7 --eval_resolution_level 4 \
--data_type "nerfstudio" --data_path $data_path \
--keyword 'bear' --train_conf 0.01 --soft_mask \
\
--workspace "./outputs/bear/text_corgi" --iters 10000 \
--train_resolution_level 7 --eval_resolution_level 7 \
--editing_from './outputs/bear/base/checkpoints/df_ep0030.pth' --pretrained \
--text 'a corgi in a forest' \
--text_fg 'a corgi' \
--lambda_sd 0.01 --keep_bg 1000 \
--stage_time --detach_bg --random_bg_c --clip_view \
data_path="./data/bear"
python main.py -O2 \
--workspace "./outputs/bear/base" --iters 3000 \
--backbone grid --bound 2 --train_resolution_level 4 --eval_resolution_level 4 \
--data_type "nerfstudio" --data_path $data_path \
--keyword 'bear' --train_conf 0.01 --soft_mask \
\
--workspace "./outputs/bear/img_dog2" --iters 5000 \
--train_resolution_level 7 --eval_resolution_level 7 \
--editing_from './outputs/bear/base/checkpoints/df_ep0030.pth' --pretrained \
--text 'a <new1> dog in a forest' \
--text_fg 'a dog' \
--lambda_sd 0.01 --keep_bg 1000 \
--stage_time --detach_bg --random_bg_c --clip_view \
--use_cd 'data_cd/dog2_cd' \
--refer_path 'data_cd/dog2/dog2.png' \
- --refer_path specifies the path to the reference image
- --use_cd specifies the model path fine-tuned by CustomDiffusion
Similarly as above, if you want to render edited NeRF, add additional commands:
--test --eval_resolution_level 3
We've released dynamic videos on the project page in both text-driven and image-driven NeRF editing.
We thank the awesome research works Custom Diffusion, torch-ngp, Grouded SAM.
@article{he2023customize,
title={Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training},
author={He, Runze and Huang, Shaofei and Nie, Xuecheng and Hui, Tianrui and Liu, Luoqi and Dai, Jiao and Han, Jizhong and Li, Guanbin and Liu, Si},
journal={arXiv preprint arXiv:2312.01663},
year={2023}
}
If you have any comments or questions, please open a new issue or feel free to contact Runze He.