Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KPConv on tensorflow produces large loss when using deformable layers #534

Open
3 tasks done
biophase opened this issue May 17, 2022 · 0 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@biophase
Copy link

Checklist

Describe the issue

I'm trying to train KPConv on the Paris-Lille3D dataset but I'm getting a very high loss (in the millions range) right from the start, which makes training impossible - after ~100 epochs there is no negative trend. I am using the default config kpconv_parislille3d.yml. After switching the architecture to not use any deformable layers training works as expected, which leads me to believe there is a problem with the implementation of the deformable layer.

<style> </style>
architecture, which works problematic architecture
simple simple
resnetb resnetb
resnetb_strided resnetb_strided
resnetb resnetb
resnetb resnetb_strided
resnetb_strided resnetb_deformable
resnetb resnetb_deformable_strided
resnetb resnetb_deformable
resnetb_strided resnetb_deformable_strided
resnetb resnetb_deformable
resnetb nearest_upsample
resnetb_strided unary
resnetb nearest_upsample
nearest_upsample unary
unary nearest_upsample
nearest_upsample unary
unary nearest_upsample
nearest_upsample unary
unary  
nearest_upsample  
unary  

Steps to reproduce the bug

1. Download the dataset from https://npm3d.fr/paris-lille-3d
2. On Colab run the following cell:

!pip uninstall -y -qqq tensorflow 
!pip install  -qqq tensorflow==2.5.3
!pip install  -qqq open3d
!pip uninstall -y -qqq tensorflow-probability
!pip install  -qqq tensorflow-probability==0.13.0
!pip uninstall -y numpy
!pip install numpy==1.19.5
  1. Finally run the training:
import os
import open3d.ml as _ml3d
import open3d.ml.tf as ml3d

cfg_file = config_path
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

model = ml3d.models.KPFCNN(**cfg.model)
cfg.dataset['dataset_path'] = dataset_path
dataset = ml3d.datasets.ParisLille3D(cfg.dataset.pop('dataset_path', None), **cfg.dataset)
pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=dataset, device="gpu", **cfg.pipeline)



# prints training progress in the console.
pipeline.run_train()


### Error message

INFO - 2022-05-17 17:06:02,791 - semantic_segmentation - <open3d._ml3d.tf.models.kpconv.KPFCNN object at 0x7f849d2d3150>
INFO - 2022-05-17 17:06:02,792 - semantic_segmentation - Logging in file : ./logs/KPFCNN_ParisLille3D_tf/log_train_2022-05-17_17:06:02.txt
INFO - 2022-05-17 17:06:02,801 - parislille3d - Found 3 pointclouds for training
INFO - 2022-05-17 17:06:05,246 - parislille3d - Found 1 pointclouds for validation
INFO - 2022-05-17 17:06:07,220 - semantic_segmentation - Writing summary in train_log/00010_KPFCNN_ParisLille3D_tf.
INFO - 2022-05-17 17:06:07,231 - semantic_segmentation - Initializing from scratch.
INFO - 2022-05-17 17:06:07,235 - semantic_segmentation - === EPOCH 0/100 ===
training: 41it [01:51,  2.73s/it]
validation: 11it [00:13,  1.19s/it]
INFO - 2022-05-17 17:08:12,172 - semantic_segmentation - loss train: 6508379.000  eval: 528964255165186048.000
INFO - 2022-05-17 17:08:12,177 - semantic_segmentation - acc train: 0.107  eval: 0.105
INFO - 2022-05-17 17:08:12,182 - semantic_segmentation - iou train: 0.052  eval: 0.042
INFO - 2022-05-17 17:08:13,295 - semantic_segmentation - Saved checkpoint at: ./logs/KPFCNN_ParisLille3D_tf/checkpoint/ckpt-1
INFO - 2022-05-17 17:08:13,297 - semantic_segmentation - === EPOCH 1/100 ===
training: 100%|██████████| 40/40 [01:55<00:00,  1.58s/it]

### Expected behavior

The expected behavior would be a loss in the range 0<L<10 and a converging training.

### Open3D, Python and System information

```markdown
- Operating system: Ubuntu 18.04.5 LTS
- Python version: 3.7.13
- Open3D version: 0.15.2
- System type: x84 
- Is this remote workstation?: yes - I'm using google colab
- How did you install Open3D?: pip

Additional information

No response

@biophase biophase added the bug Something isn't working label May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant