Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error training on custom data #23

Open
briancantwe opened this issue Jan 27, 2024 · 19 comments
Open

Error training on custom data #23

briancantwe opened this issue Jan 27, 2024 · 19 comments

Comments

@briancantwe
Copy link

Hello! I've managed to get my test data to a point where it trains most of the way, but I'm getting this error.

2024-01-27 09:43:10.564616 easyvolcap.utils.net_utils -> save_npz: Saved model data/trained_model/l3mhet_test/latest.npz at epoch 59 net_utils.py:449
l3mhet_test
0:00:02 59 29959 0.029935 13.920692 0.020401 0.050336 0.0011 0.0598 0.002006 3361
0:00:01 59 29969 0.029938 13.879657 0.020609 0.050547 0.0010 0.0600 0.002005 3361
0:00:01 59 29979 0.029935 13.817179 0.020912 0.050847 0.0008 0.0614 0.002003 3361
0:00:00 59 29989 0.029906 13.771573 0.021099 0.051005 0.0010 0.0579 0.002002 3361
0:00:00 59 29999 0.029902 13.829483 0.020795 0.050696 0.0011 0.0494 0.002000 3361
eta epoch iter prop_loss psnr img_loss loss data batch lr max_mem
2024-01-27 09:43:12.650359 easyvolcap.runners.evaluators.volumetric_video_evaluator -> evaluate: camera: 0 frame: 0 volumetric_video_evaluator.py:46
{'psnr': 10.659576416015625, 'ssim': 0.08238379, 'lpips': 0.6307356953620911}
2024-01-27 09:43:13.742178 easyvolcap.runners.volumetric_video_runner -> 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 0:00:03 < 0:00:00 ? it/s v…
test_generator:
2024-01-27 09:43:13.744583 easyvolcap.runners.evaluators.volumetric_video_evaluator -> summarize: volumetric_video_evaluator.py:72
{
'psnr_mean': 10.659576416015625,
'psnr_std': 0.0,
'ssim_mean': 0.08238379657268524,
'ssim_std': 7.450580596923828e-09,
'lpips_mean': 0.6307356953620911,
'lpips_std': 0.0
}
2024-01-27 09:43:13.748727 easyvolcap.runners.volumetric_video_runner -> train: Error in validation pass, ignored and volumetric_video_runner.py:308
continuing
╭─────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────╮
│ /mnt/c/Users/User/Documents/Github/EasyVolcap/easyvolcap/runners/volumetric_video_runner.py:306 in train │
│ │
│ ❱ 306 │ │ │ │ │ self.test_epoch(epoch + 1) # will this provoke a live display? │
│ │
│ /mnt/c/Users/Use/Documents/Github/EasyVolcap/easyvolcap/runners/volumetric_video_runner.py:405 in test_epoch │
│ │
│ ❱ 405 │ │ for _ in test_generator: pass # the actual calling │
│ │
│ /mnt/c/Users/User/Documents/Github/EasyVolcap/easyvolcap/runners/volumetric_video_runner.py:432 in test_generator │
│ │
│ ❱ 432 │ │ scalar_stats = self.evaluator.summarize() │
│ │
│ /mnt/c/Users/User/Documents/Github/EasyVolcap/easyvolcap/runners/evaluators/volumetric_video_evaluator.py:80 in summarize │
│ │
│ ❱ 80 │ │ │ │ json.dump(metric, f, indent=4) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 0.08238379 is not JSON serializable
(easyvolcap) User@home:/mnt/c/Users/User/Documents/Github/EasyVolcap$

Any ideas?

Thanks!

@dendenxu
Copy link
Member

Hi, I wrapped a try block around the json export command (shouldn't have thrown the TypeError since we should have already converted numpy scalars to python scalars before writing the json). So now if metrics.json couldn't be saved, the training and evaluation will just continue.

By the way, the PSNR looks to be too low, this seems like a similar bounding box issue to this issue, maybe you can also take a look there.
If the camera pose and bounding box is set up correctly, running the generalizable enerfi model should have already given you a reasonable result.

@briancantwe
Copy link
Author

briancantwe commented Jan 28, 2024 via email

@dendenxu
Copy link
Member

The POINT folder will be created after running this command (in the running 3DGS section of the guide):

# Extract geometry (point cloud) for initialization from the l3mhet model
# Tune image sample rate and resizing ratio for a denser or sparser estimation
python scripts/tools/volume_fusion.py -- -c configs/exps/l3mhet/l3mhet_${expname}.yaml val_dataloader_cfg.dataset_cfg.ratio=0.15

This command essentially renders depth maps and fuse them for initialization of 3DGS.
Note that you might need to tune the image resolution (using ratio) or other parameters to get a reasonably sized result.
There's also a --skip_geometry_consistency switch to disable the "fusion" process, which might prune out too many points.

@briancantwe
Copy link
Author

Of course, your right. Sorry about that. There are a lot of steps. :-)

I'm facing another problem now though in that my results are all black. I'm sure it's with my custom data setup, but I'm not sure where to start debugging.

Thanks!

@briancantwe
Copy link
Author

Actually, I've found my output in the /results dir and am getting a very sparse pointcloud after conversion. So, I'm getting closer! Stay tuned.

@dendenxu
Copy link
Member

The rendering being all black might be a bug. Is it possible that you send me a one-frame sample of your custom data for me to try to reproduce? Maybe through email?

@briancantwe
Copy link
Author

I'm now past the black rendering issue, but getting poor results from NGP-T. Is there possibly a way to control or output test RENDER/DEPTH/ALPHA images from more cameras? That would be useful in debugging my data.

Sorry, I unfortunately can't share the data I'm using.

@dendenxu
Copy link
Member

The ngpt models can also be viewed from the GUI, which I also often use for debugging.

Simply replacing adding -t gui to the training command should do the trick. Note that ngpt might be very slow to render, so it could be helpful to run it in fp16 model by append configs/specs/fp16.yaml to the command and setting viewer_cfg.render_ratio=0.1 for faster visualization.
Also, you could append this config configs/specs/superm.yaml to skip the image loading process since we only want to visualize the model.

@briancantwe
Copy link
Author

OK, after a bit of debugging on my data, I'm now getting a reasonable RENDER. However my DEPTH images are pretty much garbage. Any suggestions? I tried messing with the near and far settings, but it didn't seem to do much. Unfortunately I'm getting a Segmentation fault running the gui, which doesn't happen on the examples, so that can't be good. Investigation continues!

@rexainn
Copy link

rexainn commented Jan 31, 2024

Hi, when you run l3mhet to optimize calibration, could it converge?

@dendenxu
Copy link
Member

OK, after a bit of debugging on my data, I'm now getting a reasonable RENDER. However my DEPTH images are pretty much garbage. Any suggestions? I tried messing with the near and far settings, but it didn't seem to do much. Unfortunately I'm getting a Segmentation fault running the gui, which doesn't happen on the examples, so that can't be good. Investigation continues!

Looks like a near-far problem. Could try setting near a little bit bigger?

@dendenxu
Copy link
Member

dendenxu commented Feb 4, 2024

Hi @briancantwe I updated the metric logging mechanism to make errors more verbose, could you try the same training command and see whether we could locate the root cause of the TypeError or check whether the error has went away?

@briancantwe
Copy link
Author

briancantwe commented Feb 4, 2024

Hi @dendenxu! It does appear that the Type Error went away with the update.

I played a lot with the box and clipping sizes, but still no luck with depth maps. I've used the input colmap data with other research projects, but could have hit a snag with the EasyVolcap conversion/requirements.

I was considering trying to use my own dense pointclouds or perhaps trying im4D (since it appears to take the same input format) just to see if perhaps something specific about NGP-T that didn't like my scene.

@briancantwe
Copy link
Author

briancantwe commented Feb 4, 2024

Actually, there's no requirement in EasyVolcap for the cameras to all be the same focal length or the images all the same aspect ratio, is there? I have a mix of formats.

@dendenxu
Copy link
Member

dendenxu commented Feb 5, 2024

Actually, there's no requirement in EasyVolcap for the cameras to all be the same focal length or the images all the same aspect ratio, is there? I have a mix of formats.

Yes, we took special care of the data loading process to support differently sized images (or with different intrinsic).

@dendenxu
Copy link
Member

dendenxu commented Feb 5, 2024

Hi @dendenxu! It does appear that the Type Error went away with the update.

I played a lot with the box and clipping sizes, but still no luck with depth maps. I've used the input colmap data with other research projects, but could have hit a snag with the EasyVolcap conversion/requirements.

I was considering trying to use my own dense pointclouds or perhaps trying im4D (since it appears to take the same input format) just to see if perhaps something specific about NGP-T that didn't like my scene.

There's a visualize_camera script that outputs a ply file for your converted camera parameters.
You could check whether this visualization and the COLMAP visualization (and your actual setup) can match up.

@dendenxu
Copy link
Member

dendenxu commented Feb 5, 2024

I was considering trying to use my own dense pointclouds or perhaps trying im4D (since it appears to take the same input format) just to see if perhaps something specific about NGP-T that didn't like my scene.

Aside from Im4D, you could also try visualizing the dataset with the ENeRFi inference model as mentioned here.
It's also a good way to check whether the camera pose is reasonable (aside from visualizing the cameras)

@briancantwe
Copy link
Author

briancantwe commented Feb 7, 2024

Ok, tried out the visualize_cameras script. Cameras all appear to be in the right spot. Not sure if the link for ENeRFi usage is correct above? I get an all grey screen if I use evc -t gui though.

@dendenxu
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants