KPConv on tensorflow produces large loss when using deformable layers #534

biophase · 2022-05-17T18:01:52Z

Checklist

I have searched for similar issues.
I have tested with the latest development wheel.
I have checked the release documentation and the latest documentation (for master branch).

Describe the issue

I'm trying to train KPConv on the Paris-Lille3D dataset but I'm getting a very high loss (in the millions range) right from the start, which makes training impossible - after ~100 epochs there is no negative trend. I am using the default config kpconv_parislille3d.yml. After switching the architecture to not use any deformable layers training works as expected, which leads me to believe there is a problem with the implementation of the deformable layer.

architecture, which works	problematic architecture
simple	simple
resnetb	resnetb
resnetb_strided	resnetb_strided
resnetb	resnetb
resnetb	resnetb_strided
resnetb_strided	resnetb_deformable
resnetb	resnetb_deformable_strided
resnetb	resnetb_deformable
resnetb_strided	resnetb_deformable_strided
resnetb	resnetb_deformable
resnetb	nearest_upsample
resnetb_strided	unary
resnetb	nearest_upsample
nearest_upsample	unary
unary	nearest_upsample
nearest_upsample	unary
unary	nearest_upsample
nearest_upsample	unary
unary
nearest_upsample
unary

Steps to reproduce the bug

1. Download the dataset from https://npm3d.fr/paris-lille-3d
2. On Colab run the following cell:

!pip uninstall -y -qqq tensorflow 
!pip install  -qqq tensorflow==2.5.3
!pip install  -qqq open3d
!pip uninstall -y -qqq tensorflow-probability
!pip install  -qqq tensorflow-probability==0.13.0
!pip uninstall -y numpy
!pip install numpy==1.19.5

Finally run the training:

import os
import open3d.ml as _ml3d
import open3d.ml.tf as ml3d

cfg_file = config_path
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

model = ml3d.models.KPFCNN(**cfg.model)
cfg.dataset['dataset_path'] = dataset_path
dataset = ml3d.datasets.ParisLille3D(cfg.dataset.pop('dataset_path', None), **cfg.dataset)
pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=dataset, device="gpu", **cfg.pipeline)



# prints training progress in the console.
pipeline.run_train()



### Error message

INFO - 2022-05-17 17:06:02,791 - semantic_segmentation - <open3d._ml3d.tf.models.kpconv.KPFCNN object at 0x7f849d2d3150>
INFO - 2022-05-17 17:06:02,792 - semantic_segmentation - Logging in file : ./logs/KPFCNN_ParisLille3D_tf/log_train_2022-05-17_17:06:02.txt
INFO - 2022-05-17 17:06:02,801 - parislille3d - Found 3 pointclouds for training
INFO - 2022-05-17 17:06:05,246 - parislille3d - Found 1 pointclouds for validation
INFO - 2022-05-17 17:06:07,220 - semantic_segmentation - Writing summary in train_log/00010_KPFCNN_ParisLille3D_tf.
INFO - 2022-05-17 17:06:07,231 - semantic_segmentation - Initializing from scratch.
INFO - 2022-05-17 17:06:07,235 - semantic_segmentation - === EPOCH 0/100 ===
training: 41it [01:51,  2.73s/it]
validation: 11it [00:13,  1.19s/it]
INFO - 2022-05-17 17:08:12,172 - semantic_segmentation - loss train: 6508379.000  eval: 528964255165186048.000
INFO - 2022-05-17 17:08:12,177 - semantic_segmentation - acc train: 0.107  eval: 0.105
INFO - 2022-05-17 17:08:12,182 - semantic_segmentation - iou train: 0.052  eval: 0.042
INFO - 2022-05-17 17:08:13,295 - semantic_segmentation - Saved checkpoint at: ./logs/KPFCNN_ParisLille3D_tf/checkpoint/ckpt-1
INFO - 2022-05-17 17:08:13,297 - semantic_segmentation - === EPOCH 1/100 ===
training: 100%|██████████| 40/40 [01:55<00:00,  1.58s/it]

### Expected behavior

The expected behavior would be a loss in the range 0<L<10 and a converging training.

### Open3D, Python and System information

```markdown
- Operating system: Ubuntu 18.04.5 LTS
- Python version: 3.7.13
- Open3D version: 0.15.2
- System type: x84 
- Is this remote workstation?: yes - I'm using google colab
- How did you install Open3D?: pip

Additional information

No response

The text was updated successfully, but these errors were encountered:

biophase added the bug Something isn't working label May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KPConv on tensorflow produces large loss when using deformable layers #534

KPConv on tensorflow produces large loss when using deformable layers #534

biophase commented May 17, 2022

KPConv on tensorflow produces large loss when using deformable layers #534

KPConv on tensorflow produces large loss when using deformable layers #534

Comments

biophase commented May 17, 2022

Checklist

Describe the issue

Steps to reproduce the bug

Additional information