Error training on custom data #23

briancantwe · 2024-01-27T18:02:46Z

Hello! I've managed to get my test data to a point where it trains most of the way, but I'm getting this error.

2024-01-27 09:43:10.564616 easyvolcap.utils.net_utils -> save_npz: Saved model data/trained_model/l3mhet_test/latest.npz at epoch 59 net_utils.py:449
l3mhet_test
0:00:02 59 29959 0.029935 13.920692 0.020401 0.050336 0.0011 0.0598 0.002006 3361
0:00:01 59 29969 0.029938 13.879657 0.020609 0.050547 0.0010 0.0600 0.002005 3361
0:00:01 59 29979 0.029935 13.817179 0.020912 0.050847 0.0008 0.0614 0.002003 3361
0:00:00 59 29989 0.029906 13.771573 0.021099 0.051005 0.0010 0.0579 0.002002 3361
0:00:00 59 29999 0.029902 13.829483 0.020795 0.050696 0.0011 0.0494 0.002000 3361
eta epoch iter prop_loss psnr img_loss loss data batch lr max_mem
2024-01-27 09:43:12.650359 easyvolcap.runners.evaluators.volumetric_video_evaluator -> evaluate: camera: 0 frame: 0 volumetric_video_evaluator.py:46
{'psnr': 10.659576416015625, 'ssim': 0.08238379, 'lpips': 0.6307356953620911}
2024-01-27 09:43:13.742178 easyvolcap.runners.volumetric_video_runner -> 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 0:00:03 < 0:00:00 ? it/s v…
test_generator:
2024-01-27 09:43:13.744583 easyvolcap.runners.evaluators.volumetric_video_evaluator -> summarize: volumetric_video_evaluator.py:72
{
'psnr_mean': 10.659576416015625,
'psnr_std': 0.0,
'ssim_mean': 0.08238379657268524,
'ssim_std': 7.450580596923828e-09,
'lpips_mean': 0.6307356953620911,
'lpips_std': 0.0
}
2024-01-27 09:43:13.748727 easyvolcap.runners.volumetric_video_runner -> train: Error in validation pass, ignored and volumetric_video_runner.py:308
continuing
╭─────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────╮
│ /mnt/c/Users/User/Documents/Github/EasyVolcap/easyvolcap/runners/volumetric_video_runner.py:306 in train │
│ │
│ ❱ 306 │ │ │ │ │ self.test_epoch(epoch + 1) # will this provoke a live display? │
│ │
│ /mnt/c/Users/Use/Documents/Github/EasyVolcap/easyvolcap/runners/volumetric_video_runner.py:405 in test_epoch │
│ │
│ ❱ 405 │ │ for _ in test_generator: pass # the actual calling │
│ │
│ /mnt/c/Users/User/Documents/Github/EasyVolcap/easyvolcap/runners/volumetric_video_runner.py:432 in test_generator │
│ │
│ ❱ 432 │ │ scalar_stats = self.evaluator.summarize() │
│ │
│ /mnt/c/Users/User/Documents/Github/EasyVolcap/easyvolcap/runners/evaluators/volumetric_video_evaluator.py:80 in summarize │
│ │
│ ❱ 80 │ │ │ │ json.dump(metric, f, indent=4) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 0.08238379 is not JSON serializable
(easyvolcap) User@home:/mnt/c/Users/User/Documents/Github/EasyVolcap$

Any ideas?

Thanks!

dendenxu · 2024-01-28T12:54:14Z

Hi, I wrapped a try block around the json export command (shouldn't have thrown the TypeError since we should have already converted numpy scalars to python scalars before writing the json). So now if metrics.json couldn't be saved, the training and evaluation will just continue.

By the way, the PSNR looks to be too low, this seems like a similar bounding box issue to this issue, maybe you can also take a look there.
If the camera pose and bounding box is set up correctly, running the generalizable enerfi model should have already given you a reasonable result.

briancantwe · 2024-01-28T15:21:51Z

I don't seem to get any POINT output after training completes. Is my custom data absolutely broken/not converging? Or possibly a bug? Maybe related? Thanks for the tip on the bounding box! My data is indeed at a very different scale than the example.

…

On Sun, Jan 28, 2024 at 4:54 AM Zhen Xu ***@***.***> wrote: Hi, I wrapped a try block around the json export command (shouldn't have thrown the TypeError since we should have already converted numpy scalars to python scalars before writing the json). So now if metrics.json couldn't be saved, the training and evaluation will just continue. By the way, the PSNR looks to be too low, this seems like a similar bounding box issue to this issue <#21 (comment)>, maybe you can also take a look there. If the camera pose and bounding box is set up correctly, running the generalizable enerfi model should have already given you a reasonable result. — Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BDMD46JEFUUES3DLHGUO4SLYQZDABAVCNFSM6AAAAABCNQUXNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU4DOMBQG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

dendenxu · 2024-01-28T16:22:26Z

The POINT folder will be created after running this command (in the running 3DGS section of the guide):

# Extract geometry (point cloud) for initialization from the l3mhet model
# Tune image sample rate and resizing ratio for a denser or sparser estimation
python scripts/tools/volume_fusion.py -- -c configs/exps/l3mhet/l3mhet_${expname}.yaml val_dataloader_cfg.dataset_cfg.ratio=0.15

This command essentially renders depth maps and fuse them for initialization of 3DGS.
Note that you might need to tune the image resolution (using ratio) or other parameters to get a reasonably sized result.
There's also a --skip_geometry_consistency switch to disable the "fusion" process, which might prune out too many points.

briancantwe · 2024-01-28T18:28:53Z

Of course, your right. Sorry about that. There are a lot of steps. :-)

I'm facing another problem now though in that my results are all black. I'm sure it's with my custom data setup, but I'm not sure where to start debugging.

Thanks!

briancantwe · 2024-01-28T20:09:16Z

Actually, I've found my output in the /results dir and am getting a very sparse pointcloud after conversion. So, I'm getting closer! Stay tuned.

dendenxu · 2024-01-29T14:54:10Z

The rendering being all black might be a bug. Is it possible that you send me a one-frame sample of your custom data for me to try to reproduce? Maybe through email?

briancantwe · 2024-01-29T17:33:51Z

I'm now past the black rendering issue, but getting poor results from NGP-T. Is there possibly a way to control or output test RENDER/DEPTH/ALPHA images from more cameras? That would be useful in debugging my data.

Sorry, I unfortunately can't share the data I'm using.

dendenxu · 2024-01-29T18:18:41Z

The ngpt models can also be viewed from the GUI, which I also often use for debugging.

Simply replacing adding -t gui to the training command should do the trick. Note that ngpt might be very slow to render, so it could be helpful to run it in fp16 model by append configs/specs/fp16.yaml to the command and setting viewer_cfg.render_ratio=0.1 for faster visualization.
Also, you could append this config configs/specs/superm.yaml to skip the image loading process since we only want to visualize the model.

briancantwe · 2024-01-30T01:44:03Z

OK, after a bit of debugging on my data, I'm now getting a reasonable RENDER. However my DEPTH images are pretty much garbage. Any suggestions? I tried messing with the near and far settings, but it didn't seem to do much. Unfortunately I'm getting a Segmentation fault running the gui, which doesn't happen on the examples, so that can't be good. Investigation continues!

rexainn · 2024-01-31T13:58:31Z

Hi, when you run l3mhet to optimize calibration, could it converge?

dendenxu · 2024-01-31T14:03:25Z

OK, after a bit of debugging on my data, I'm now getting a reasonable RENDER. However my DEPTH images are pretty much garbage. Any suggestions? I tried messing with the near and far settings, but it didn't seem to do much. Unfortunately I'm getting a Segmentation fault running the gui, which doesn't happen on the examples, so that can't be good. Investigation continues!

Looks like a near-far problem. Could try setting near a little bit bigger?

dendenxu · 2024-02-04T09:59:30Z

Hi @briancantwe I updated the metric logging mechanism to make errors more verbose, could you try the same training command and see whether we could locate the root cause of the TypeError or check whether the error has went away?

briancantwe · 2024-02-04T18:43:01Z

Hi @dendenxu! It does appear that the Type Error went away with the update.

I played a lot with the box and clipping sizes, but still no luck with depth maps. I've used the input colmap data with other research projects, but could have hit a snag with the EasyVolcap conversion/requirements.

I was considering trying to use my own dense pointclouds or perhaps trying im4D (since it appears to take the same input format) just to see if perhaps something specific about NGP-T that didn't like my scene.

briancantwe · 2024-02-04T23:40:04Z

Actually, there's no requirement in EasyVolcap for the cameras to all be the same focal length or the images all the same aspect ratio, is there? I have a mix of formats.

dendenxu · 2024-02-05T05:07:07Z

Actually, there's no requirement in EasyVolcap for the cameras to all be the same focal length or the images all the same aspect ratio, is there? I have a mix of formats.

Yes, we took special care of the data loading process to support differently sized images (or with different intrinsic).

dendenxu · 2024-02-05T05:09:13Z

Hi @dendenxu! It does appear that the Type Error went away with the update.

I played a lot with the box and clipping sizes, but still no luck with depth maps. I've used the input colmap data with other research projects, but could have hit a snag with the EasyVolcap conversion/requirements.

I was considering trying to use my own dense pointclouds or perhaps trying im4D (since it appears to take the same input format) just to see if perhaps something specific about NGP-T that didn't like my scene.

There's a visualize_camera script that outputs a ply file for your converted camera parameters.
You could check whether this visualization and the COLMAP visualization (and your actual setup) can match up.

dendenxu · 2024-02-05T05:15:38Z

I was considering trying to use my own dense pointclouds or perhaps trying im4D (since it appears to take the same input format) just to see if perhaps something specific about NGP-T that didn't like my scene.

Aside from Im4D, you could also try visualizing the dataset with the ENeRFi inference model as mentioned here.
It's also a good way to check whether the camera pose is reasonable (aside from visualizing the cameras)

briancantwe · 2024-02-07T17:46:26Z

Ok, tried out the visualize_cameras script. Cameras all appear to be in the right spot. Not sure if the link for ENeRFi usage is correct above? I get an all grey screen if I use evc -t gui though.

dendenxu · 2024-02-22T06:55:29Z

Ah indeed the links was wrong.
Updated link: https://github.com/zju3dv/EasyVolcap?tab=readme-ov-file#inferencing-with-enerfi
and: https://github.com/zju3dv/EasyVolcap/blob/main/docs/projects/enerf.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error training on custom data #23

Error training on custom data #23

briancantwe commented Jan 27, 2024

dendenxu commented Jan 28, 2024

briancantwe commented Jan 28, 2024 via email •

edited

Loading

dendenxu commented Jan 28, 2024

briancantwe commented Jan 28, 2024

briancantwe commented Jan 28, 2024

dendenxu commented Jan 29, 2024

briancantwe commented Jan 29, 2024

dendenxu commented Jan 29, 2024

briancantwe commented Jan 30, 2024

rexainn commented Jan 31, 2024

dendenxu commented Jan 31, 2024

dendenxu commented Feb 4, 2024

briancantwe commented Feb 4, 2024 •

edited

Loading

briancantwe commented Feb 4, 2024 •

edited

Loading

dendenxu commented Feb 5, 2024

dendenxu commented Feb 5, 2024

dendenxu commented Feb 5, 2024

briancantwe commented Feb 7, 2024 •

edited

Loading

dendenxu commented Feb 22, 2024

Error training on custom data #23

Error training on custom data #23

Comments

briancantwe commented Jan 27, 2024

dendenxu commented Jan 28, 2024

briancantwe commented Jan 28, 2024 via email • edited Loading

dendenxu commented Jan 28, 2024

briancantwe commented Jan 28, 2024

briancantwe commented Jan 28, 2024

dendenxu commented Jan 29, 2024

briancantwe commented Jan 29, 2024

dendenxu commented Jan 29, 2024

briancantwe commented Jan 30, 2024

rexainn commented Jan 31, 2024

dendenxu commented Jan 31, 2024

dendenxu commented Feb 4, 2024

briancantwe commented Feb 4, 2024 • edited Loading

briancantwe commented Feb 4, 2024 • edited Loading

dendenxu commented Feb 5, 2024

dendenxu commented Feb 5, 2024

dendenxu commented Feb 5, 2024

briancantwe commented Feb 7, 2024 • edited Loading

dendenxu commented Feb 22, 2024

briancantwe commented Jan 28, 2024 via email •

edited

Loading

briancantwe commented Feb 4, 2024 •

edited

Loading

briancantwe commented Feb 4, 2024 •

edited

Loading

briancantwe commented Feb 7, 2024 •

edited

Loading