Skip to content

Latest commit

 

History

History
370 lines (263 loc) · 15.4 KB

README.md

File metadata and controls

370 lines (263 loc) · 15.4 KB

Implemented methods

Object-centric

Methods Official / Un-official Get started Notes, major difference from paper, etc.
NeuS in minutes Un-official readme - support object-centric datasets as well as indoor datasets
- fast and stable convergence without needing mask
- support using NGP / LoTD or MLPs as cr&dv representations
- large pixel batch size (4096) & pixel error maps
NGP Un-official - support using NGP / LoTD or MLPs as representations

Street-view

Methods Official / Un-official Get started Notes, major difference from paper, etc.
StreetSurf Official readme - LiDAR loss improved (using L1 and discarding outliers)
NGP with LiDAR Un-official readme - using Urban-NeRF's LiDAR loss

General usage

Dataset preparation


NOTE:

  • 🏃 You can combine multiple subtasks listed below and automatically execute them one by one with run.py . For example:

    • python code_single/tools/run.py train,eval,eval_lidar,extract_mesh \
      --config code_single/configs/xxx.yaml \
      --eval.downscale=2 --eval_lidar.lidar_id=lidar_TOP \
      --extract_mesh.to_world --extract_mesh.res=0.1
    • --config or --resume_dir are common args shared across all subtasks.
  • 📌 All the instructions below assume you have already cd into /path/to/neuralsim .

Training

> Start a fresh new training

python code_single/tools/train.py --config code_single/configs/xxx.yaml

⚙️ You can specify temporary configs via command line args like --aaa.bbb=ccc, which will temporarily modify the aaa:bbb field in xxx.yaml in this run. For more details on how the command line and yaml configuration work, please refer to this doc .

> Resume a previous experiment

python code_single/tools/train.py --resume_dir /path/to/logs/xxx

You can also resume a experiment if everything is not changed in the original config yaml by directly specifying --config code_single/configs/xxx.yaml .

> Rich training logs

We provide rich logging information with tensorboard.

Check them out by

tensorboard --logdir /path/to/logs/xxx

The logging frequency of scalars is controlled by training:i_log field. (how many iterations per log entry).

The logging frequency of images (visualization or renderings) is controlled by training:i_val field.

> Optional DDP training

➡️ Single node multi GPUs

Taking an example of a single machine with 4 GPUs:

📌 NOTE: You only need to add the --ddp option to the command line arguments of train.py.

python -m torch.distributed.launch --nproc_per_node=4 \
code_single/tools/train.py \
--config code_single/configs/waymo/streetsurf/withmask_withlidar_joint.240219.yaml \
--ddp

In the above example, if everything works properly, you will see the following message printed four times with differen ranks in the logs:

=> Enter init_process_group(): 
	=> rank=0
	=> world_size=4
	=> local_rank=0
	=> master_addr=127.0.0.1
	=> master_port=29500
...
=> Done init Env @ DDP: 
	=> device_ids set to [0]
	=> rank=0
	=> world_size=4
	=> local_rank=0
	=> master_addr=127.0.0.1
	=> master_port=29500
...

➡️ Multi nodes multi GPUs

Taking an example of a 2 nodes with 4 GPUs each (i.e. 8 GPUs in total):

📌 NOTE: You only need to add the --ddp option to the command line arguments of train.py.

python -m torch.distributed.launch --nnodes=$WORLD_SIZE --nproc_per_node=4 \
--master_addr=$MASTER_ADDR --master_port=$MASTER_PORT --node_rank=$RANK \
code_single/tools/train.py \
--config code_single/configs/waymo/streetsurf/withmask_withlidar_joint.240219.yaml \
--ddp 

In the above example, if everything works properly, you will see the following message printed four times with differen ranks in the logs of the master node:

=> Enter init_process_group(): 
	=> rank=0
	=> world_size=8
	=> local_rank=0
	=> master_addr=dlcfevx8ltikuljg-master-0
	=> master_port=23456
...
=> Done init Env @ DDP: 
	=> device_ids set to [0]
	=> rank=0
	=> world_size=8
	=> local_rank=0
	=> master_addr=dlcfevx8ltikuljg-master-0
	=> master_port=23456
...

As for the worker node's print logs:

=> Enter init_process_group(): 
	=> rank=4
	=> world_size=8
	=> local_rank=0
	=> master_addr=dlcfevx8ltikuljg-master-0
	=> master_port=23456
...
=> Done init Env @ DDP: 
	=> device_ids set to [0]
	=> rank=4
	=> world_size=8
	=> local_rank=0
	=> master_addr=dlcfevx8ltikuljg-master-0
	=> master_port=23456
...

> Debug training errors

We also provide a primitive debugging tool for checking gradients. You can try it out by modifying self.debug_grad=True in the Trainer class. Note that this will significantly slow down training and should be used along with debugpy or other tools.

Rendering

The tools/render.py works in two modes, namely replay or NVS (novel_view_synthesis) mode. Both modes support additional LiDAR simulation or mesh visualization along with rgb, depth and surface normals rendering.

> Replay

By default, tools/render.py runs in replay mode, which will render frames between the optionally given --start_frame and --stop_frame parameter with everything untouched.

python code_single/tools/render.py --resume_dir /path/to/logs/xxx --downscale=1 \
--assetbank_cfg.Main.model_params.ray_query_cfg.query_param.num_coarse=0

NOTE:

  • For street-view, rendering full size videos often consumes a lot of time. It is recommended to specify --downscale=2 or larger values.

  • Usually, ignoring num_coarse samples will not significantly affect the results and will speed up rendering.

    • For StreetSurf, simply add --assetbank_cfg.Street.model_params.ray_query_cfg.query_param.num_coarse=0
    • For other single object datasets, simply add --assetbank_cfg.Main.model_params.ray_query_cfg.query_param.num_coarse=0
  • Many other options can be specified while rendering, including --no_sky, --only_cr, --fps, --rayschunk etc. Check out tools/render.py for more details.

> NVS

By giving --nvs_path=... etc. to specify the type of the novel camera trajectory and other configs, tools/render.py runs in NVS mode.

--nvs_node_id is used to specify the scene graph node whose trajectory you wish to manipulate. Typically, for single-object datasets, this node is camera. For street-view datasets, it's ego_car.

➡️ Example for single-object NVS

python code_single/tools/render.py --resume_dir logs/bmvs/5c0d13 \
--nvs_path=spherical_spiral --nvs_node_id=camera --nvs_param=48,29,54 \
--nvs_num_frames=120 --downscale=1 \
--assetbank_cfg.Main.model_params.ray_query_cfg.query_param.num_coarse=0

➡️ Example for street-view NVS

NOTE: --start_frame and --stop_frame in this case specifies the reference frames for the camera path creation method. The real length of the NVS path is specified by --nvs_num_frames.

python code_single/tools/render.py --resume_dir logs/streetsurf/seg100613 \
--nvs_path=street_view --nvs_node_id=ego_car --nvs_param=2.0,1.0,3.0,0.0,2.0,-2.0 \
--nvs_num_frames=120 --start_frame=80 --stop_frame=160 --downscale=4 \
--assetbank_cfg.Street.model_params.ray_query_cfg.query_param.num_coarse=0

> With mesh visualization

An example (click to expand) seg100613_ds=4_withmesh

To visualize a specific mesh from the perspective of the original cameras when rendering, additionally specify --render_mesh=xxx.ply :

python code_single/tools/render.py --resume_dir /path/to/logs/xxx \
--downscale=4 --render_mesh /path/to/logs/xxx/meshes/xxx.ply \
--render_mesh_transform=identity \
--assetbank_cfg.Main.model_params.ray_query_cfg.query_param.num_coarse=0

NOTE: If the input mesh is already in world coordinates (e.g. --to_world is specified when extracting mesh), --render_mesh_transform should just be identity. If the input mesh is in object coordinates, --render_mesh_transform should be to_world.

Appearance Evaluation

This is similar to the replay mode in tools/render.py, but with additional calculations and limitations for evaluation.

python code_single/tools/eval.py --resume_dir /path/to/logs/xxx

LiDAR simulation

> Simulate a single LiDAR

For example, to simulate the original LiDAR model:

python code_single/toos/render.py --resume_dir /path/to/logs/xxx \
--no_cam --render_lidar --lidar_model=original_reren --lidar_id=lidar_TOP

➡️ A visualization window can be popped up by additionally specifying --lidar_vis_verbose.

You can also try this out when rendering in NVS mode.

> Simulate a demo of a list of LiDAR models

In addition to the original LiDAR, numerous other real-world LiDAR models can be simulated.

➡️ Below is a script that sequentially simulates a list of LiDAR models:

bash code_single/tools/demo_lidar_sim.sh /path/to/logs/xxx --lidar_vis_width=1200

➡️ A visualization window can be popped up by additionally specifying --lidar_vis_verbose.

LiDAR evaluation

python code_single/tools/eval_lidar.py --resume_dir /path/to/logs/xxx \
--lidar_id=lidar_TOP --dirname=eval_lidar_TOP

➡️ A visualization video like the one in StreetSurf website can be produced by additionally specifying --video_backend=vedo:

python code_single/tools/eval_lidar.py --resume_dir /path/to/logs/xxx \
--lidar_id=lidar_TOP --dirname=eval_lidar_TOP --video_backend=vedo

➡️ A visualization window can be popped up by additionally specifying --video_verbose.

Mesh extraction

To extract mesh of a specific experiment:

➡️ For SDF networks:

python code_single/tools/extract_mesh.py --resume_dir /path/to/logs/xxx \
--to_world --res=0.1

➡️ For NeRF networks: (you can specify other sigma threshold with --levelset= )

python code_single/tools/extract_mesh.py --resume_dir /path/to/logs/xxx \
--to_world --res=0.1 --network_type=nerf --levelset=1.0

Occupancy grid extraction

python code_single/tools/extract_occgrid.py --resume_dir /path/to/logs/xxx \
--occ_res=0.1

A visualization window can be popped up when the extraction is finished by additionally specifying --verbose (:warning: Might run out of CPU mem if occ_res is small and the resulting resolution is large).

> Format of the extracted occupancy grid

We opt to store the actual occupied integer coordinates rather than a full-resolution 3D boolean grid to save space.

The output file is in .npz format, containing occupied vertex coordinates and meta information.

Below is a description and example of how to read the file:

import numpy as np
datadict = np.load("xxx.npz", allow_pickle=True)
datadict['occ_corners'] # [N, 3], int16, integer coordinates of the actual occupied grid points, where N represents the number of actual occupied grids
datadict['sidelength'] # [res_x, res_y, res_z], int, integer side lengths allocated in x, y, z directions respectively
datadict['occ_res'] # float, default 0.1, resolution setting when extracting occupied grid, i.e., the side length of each cubic grid
datadict['coord_min'] # [3,], float, world coordinates corresponding to the vertex at the front-left-bottom corner (the vertex with smaller values in x, y, z directions) of the integer coordinate [0,0,0] grid
datadict['coord_offset'] # [3,], float, offset between the world coordinate system definition and the world coordinate system definition of the original data sequence (original_world=current_world+coord_offset)
datadict['meta'] # dict, a dictionary containing the meta information of the scene
datadict['meta']['scene_id'] # str, full name id of the current sequence "segment-xxxxx-with_camera_labels"
datadict['meta']['start_frame'] # int, the start frame defined during the training of the current sequence
datadict['meta']['num_frames'] # int, the total number of frames defined during the training of the current sequence (end frame = start frame + total number of frames)

# To read:
voxel_coords_in_world = datadict['occ_corners'].astype(float) * datadict['occ_res'] + datadict['coord_min']
voxel_coords_in_data_world = voxel_coords_in_world + datadict['coord_offset']