Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

[Paper] | [Project Page]

Kaihua Chen*, Tarasha Khurana*, Deva Ramanan

This repository contains the official implementation of CogNVS.

TODO

Release CogNVS inference pipeline and checkpoints
Release self-supervised data generation code
Release CogNVS test-time finetuning code
Release evaluation code on Kubric-4D, ParallelDomain-4D, and Dycheck
Train a better CogNVS inpainting checkpoint with more data, once more compute is available

1. Getting Started

1.1 Installation

Clone the repository and set up the environment:

git clone https://github.com/Kaihua-Chen/cog-nvs
cd cog-nvs
conda create --name cognvs python=3.11
conda activate cognvs
pip install -r cognvs_requirements.txt

1.2 Download Checkpoints

CogVideoX base model

Download the original CogVideoX-5b-I2V checkpoints from: https://huggingface.co/zai-org/CogVideoX-5b-I2V
CogNVS inpainting checkpoint

We provide CogNVS inpainting checkpoints, which can be used for further test-time finetuning on the sequences you want:
```
mkdir checkpoints
cd checkpoints
git lfs install
git clone https://huggingface.co/kaihuac/cognvs_ckpt_inpaint
cd ..
```
(Optional) Test-time finetuned checkpoints

Please refer to Step 3 "Self-supervised Data Pair Generation" to generate training pairs and then follow Step 4 "Test-time Finetuning" to finetune our inpainting checkpoints on your target sequence.

We also provide checkpoints already finetuned on our demo_data. If you want to skip test-time finetuning, download them (~20GB each) from: Link

2. Inference

You can run inference in three ways:

Use the CogNVS inpainting checkpoint directly (not recommended; only for quick test, quality is usually lower)
Download and use our provided test-time finetuned checkpoints
Perform your own test-time finetuning (following instructions in later sections) and run inference afterward

Example using a test-time finetuned checkpoint:

python demo.py \
    --model_path "checkpoints/CogVideoX-5b-I2V" \
    --cognvs_ckpt_path "checkpoints/cognvs_ckpt_finetuned_davis_bear/my_checkpoint-200_transformer" \
    --data_path "demo_data/davis_bear" \
    --mp4_name "example_eval_render.mp4"

where mp4_name is the name of the input video, and can also be a pattern like eval_render*.mp4.

The output will be saved to:

demo_data/davis_bear/outputs/

3. Self-supervised Data Pair Generation

Sequence folder structure

sequence_name/
├─ gt_rgb.mp4
└─ cam_info/
   └─ megasam_depth.npy
   └─ megasam_intrinsics.npy (optional)
   └─ megasam_c2ws.npy (optional)

Generate training pairs

python data_gen.py \
    --device "cuda:0" \
    --data_path "demo_data/davis_bear" \
    --mode "train" \
    --intrinsics_file "cam_info/megasam_intrinsics.npy" \
    --extrinsics_file "cam_info/megasam_c2ws.npy"

(intrinsics_file and extrinsics_file are optional. The pipeline still works if you only provide the depth file from MegaSAM, DepthCrafter, etc.)

Generate evaluation pairs

python data_gen.py \
    --device "cuda:0" \
    --data_path "demo_data/davis_bear" \
    --mode "eval"

Evaluation renders will be created from predefined trajectories in the trajs/ folder. You can customize trajectories by editing those .txt files.

4. Test-time Finetuning

After generating training pairs, edit the config files and run test-time finetuning:

Edit finetune/finetune_cognvs.sh:
- model_path: path to CogVideoX-5b-I2V checkpoint
- transformer_id: path to our CogNVS inpainting checkpoint
- output_dir: path to save the finetuned checkpoint
- base_dir_input: sequence folder with training pairs
Optional parameters:
- train_epochs: number of epochs
- checkpointing_steps: steps to save checkpoints
- checkpointing_limit: max number of checkpoints to keep
- do_validation: set True to enable validation (slower)
- validation_steps: steps to run validation
Edit finetune/accelerate_config.yaml:
- gpu_ids: GPU ids for training
- num_processes: must match number of GPU ids
Start finetuning:

cd finetune
sh finetune_cognvs.sh

⚠️ Note: We adopt DeepSpeed ZeRO-2 for finetuning, so it can fit into A6000 GPUs (48 GB), but you need ≥ 5 GPUs. For reference, 200 steps of finetuning take ~70 minutes on 8 A6000 Ada GPUs.

Process finetuned checkpoints

Place the following files from the toolbox/ folder into the checkpoints/ directory:

config.json
diffusion_pytorch_model.safetensors.index.json
process_ckpts.sh

The structure should be:

checkpoints/
├── config.json
├── diffusion_pytorch_model.safetensors.index.json
├── process_ckpts.sh
└── cognvs_ckpt_finetuned_bear/
    └── checkpoint-200/

Edit process_ckpts.sh to match your checkpoint step:

CHECKPOINT_DIR="checkpoint-200"

Then run:

cd checkpoints
sh process_ckpts.sh

This processing step can take ~20 min or longer, depending on your system performance.

Go back to Section 2 (Inference) and run on evaluation renders

Acknowledgements

Our work builds on CogVideoX and uses DeepSpeed ZeRO-2 for memory-efficient finetuning. Video depth estimation adopts MegaSAM or DepthCrafter. Concurrent research includes ViewCrafter, GEN3C, CAT4D, TrajectoryCrafter, ReCamMaster, etc. We thank the authors for their contributions.

Citation

If you find this work helpful, please cite:

@inproceedings{chen2025cognvs,
  title     = {Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos},
  author    = {Chen, Kaihua and Khurana, Tarasha and Ramanan, Deva},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

TODO

1. Getting Started

1.1 Installation

1.2 Download Checkpoints

2. Inference

3. Self-supervised Data Pair Generation

4. Test-time Finetuning

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
demo_data		demo_data
finetune		finetune
pipeline		pipeline
toolbox		toolbox
trajs		trajs
.gitignore		.gitignore
README.md		README.md
cognvs_requirements.txt		cognvs_requirements.txt
data_gen.py		data_gen.py
demo.py		demo.py
utils.py		utils.py

Kaihua-Chen/cog-nvs

Folders and files

Latest commit

History

Repository files navigation

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

TODO

1. Getting Started

1.1 Installation

1.2 Download Checkpoints

2. Inference

3. Self-supervised Data Pair Generation

4. Test-time Finetuning

Acknowledgements

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages