Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Modulated Deformable Convolution layer #2196

Closed
wants to merge 15 commits into from

Conversation

Licht-T
Copy link

@Licht-T Licht-T commented Oct 8, 2020

Description

This is the implementation of Modulated Deformable Convolution. This fixes #179. There is another PR #1129, but that is stale and contaminated by unknown license codes. So I re-implemented this which is based on the Torchvision's implementation.

I knew this would be too big to get reviewed, so, at first, I made the CPU only kernel with Eigen. When the CPU kernel is good enough, I will move to the next step: the GPU kernel implementation.
UPDATE: GPU kernel available!

Type of change

Checklist:

  • I've properly formatted my code according to the guidelines
    • By running Black + Flake8
    • By running pre-commit hooks
  • This PR addresses an already submitted issue for TensorFlow Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • This PR contains modifications to C++ custom-ops

How Has This Been Tested?

pytest unit-testing

@google-cla
Copy link

google-cla bot commented Oct 8, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Oct 8, 2020
@google-cla
Copy link

google-cla bot commented Oct 8, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Oct 8, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@Licht-T
Copy link
Author

Licht-T commented Oct 8, 2020

I signed CLA.

@google-cla google-cla bot added cla: yes and removed cla: no labels Oct 8, 2020
@Licht-T
Copy link
Author

Licht-T commented Oct 8, 2020

Compliant test fails due to conv_utils. It seems that there is no alternative on public TensorFlow Python API.

@Licht-T
Copy link
Author

Licht-T commented Oct 8, 2020

I found that Addons has tensorflow_addons.utils.keras_utils and moved some private APIs into it.

@Licht-T
Copy link
Author

Licht-T commented Oct 8, 2020

All green! Ready for review!

@Licht-T
Copy link
Author

Licht-T commented Oct 30, 2020

I couldn't wait to make the GPU kernel until the CPU one reviewed!

@bhack
Copy link
Contributor

bhack commented Oct 30, 2020

/cc @tanzhenyu @dynamicwebpaige for ecosystem check.

@bhack
Copy link
Contributor

bhack commented Oct 30, 2020

I put this under ecosystem review. If no project in TF ecosystem is interested in this or has this in It own internal roadmap we will try to review It here.

@cheng-yu-td
Copy link

Thanks for your eco-system review!
What should I do next? Any requests are welcome since I know this is quite massive!

Can you split your contribution in more than one PR? Or is this just a single component PR?

This is one single feature, deformable convolution layer. I don't see how this can be split into multiple pull requests.

@cheng-yu-td
Copy link

Thanks @Licht-T I look forward to this feature! Can't wait to use this for experiments.

@bhack
Copy link
Contributor

bhack commented Jan 25, 2021

This PR Is now near 3k lines.

Have we already evaluated the performance gap of a compositional (also if v1) implementation like https://tensorlayer.readthedocs.io/en/latest/_modules/tensorlayer/layers/convolution/deformable_conv.html ?

Comment on lines +182 to +183
if self.data_format != "channels_first":
raise ValueError("`channels_last` data format is not supported.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"channels_last" is the default data_format in tf.keras.layers.Conv2D. If user is selecting "channels_last", the output should be transposed, instead of raising error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapaxis makes a lot of computing cost. IMO, swapaxis should not be implicitly applied.

@Cospel
Copy link

Cospel commented May 25, 2021

The code looks great, we are using it successfully in one of our projects. Is anybody still working on it? It's been more than a year of waiting on deformable conv :) ...

@Licht-T
Copy link
Author

Licht-T commented Jun 6, 2021

Thanks for a lot of comments. I'm back!

Now I have much time to focus on this PR and will resolve some conflicts this week.

@Licht-T
Copy link
Author

Licht-T commented Jun 6, 2021

That's difficult as cheng-yu-td told, but I should do so if I can. Any ideas are welcome.

Can you split your contribution in more than one PR? Or is this just a single component PR?

@Licht-T
Copy link
Author

Licht-T commented Jun 6, 2021

@bhack Not tested yet. I'll try.

According to tensorlayer/TensorLayer#641 (comment), TensorLayer implementation is 100 times slower than the original MXNet implementation. That's why we need the optimized C++ code for Deformable Convolution.

Have we already evaluated the performance gap of a compositional (also if v1) implementation like https://tensorlayer.readthedocs.io/en/latest/_modules/tensorlayer/layers/convolution/deformable_conv.html ?

@bhack
Copy link
Contributor

bhack commented Jun 7, 2021

According to tensorlayer/TensorLayer#641 (comment)

It is 3 years old. We could check what happens now with the current compile stack.

As a general side note:

/cc @yarri-oss @tomerk As custom kernels are going to create a quite large code and maintainership overhead in the repo can we have a contact point in the XLA/HLO team to just undertestand if and when we could achieve enought performance with a compositional/compiling path (e.g. trigger some FR to the compiler stack)?

We really want reduce the number of maitnained custom ops in the library.

See also https://discuss.tensorflow.org/t/tfr-compositional-ops/

@Licht-T
Copy link
Author

Licht-T commented Jun 12, 2021

@bhack Thanks for your comments! I did some tests. In the best case, this PR is approx. 1000 times faster than the TensorLayer one.

  • Input height/width v.s. Runtime plot with fixed batch size: perf_hw.py

perf_hw

  • Batch size v.s. Runtime plot with fixed input width/height: perf_batch.py

perf_batch

@Licht-T
Copy link
Author

Licht-T commented Jun 12, 2021

I know this PR is quite a large code. I don't want to make maintainers annoyed by the large custom ops code, but I just want to provide the Deformable Convolution, which is one of the greatest achievements in CNN research, to TensorFlow users.

If we can achieve good performance in another way like XLA/HLO, I should/will re-implement this PR in that way.

@bhack
Copy link
Contributor

bhack commented Jun 12, 2021

@Licht-T Yes thank you for effort.
The main issue is that Codeowners on average disappaer quite fast see the last two graphs here and #2024).

In this specific case we will have near 3k lines to maintain.
It is I've asked to @seanpmorgan @tomerk @yarri-oss if we could have some feedback by the compiler team about compositional alternatives in cases like this.

Also I think that you need to test the tensorlayer implementation with jit_compile in core functions to test it with XLA (or duplicate the class yourself to set core parts under XLA).

You can inspect the compiled function with :

https://www.tensorflow.org/xla#inspect_compiled_programs

If you could profile it could be also interesting to see if there is any specific dominant ops.

https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

You can use profiler start and stop if you don't want to use callbacks
https://www.tensorflow.org/api_docs/python/tf/profiler/experimental/start?hl=en

@peter-kettmann
Copy link

@bhack Thanks for your comments! I did some tests. In the best case, this PR is approx. 1000 times faster than the TensorLayer one.

  • Input height/width v.s. Runtime plot with fixed batch size: perf_hw.py
perf_hw
  • Batch size v.s. Runtime plot with fixed input width/height: perf_batch.py
perf_batch

That looks really great! We are currently using deformable convolutions for a project and are using the tensorflowlayer version. But it's too slow for our use case. So I tried your implementation but ran into problems. How do I correctly build the code including the deformable conv layer?

So far I cloned your repo, switched to add-deformable-conv branch (tried the add-deformable-conv-internal as well) and followed the instructions to install tensorflow_addons from source. The build process runs fine (just some bazel warnings). However, installing the wheel and trying a simple code

import tensorflow as tf
import tensorflow_addons

input_layer = tf.keras.layers.Input(shape=(512,512,3))
channels_first = tf.transpose(input_layer, (0,3,1,2))

offset_1 = tf.keras.layers.Conv2D(2, (3,3), padding='same', data_format='channels_first', kernel_initializer='zeros')(channels_first)
deformable_1 = tensorflow_addons.layers.DeformableConv2D(3, (1,1), padding='same')([channels_first, offset_1])

channels_last = tf.transpose(deformable_1, (0,2,3,1))
model = tf.keras.Model(inputs=input_layer, outputs=channels_last)

test_data = tf.random.uniform((32,512,512,3), dtype=tf.float32)

out = model(test_data)

throws this error:

NotFoundError: /net/io-truenas/mnt/pool0/workspaces/peter/code/addons/venv/lib/python3.6/site-packages/tensorflow_addons/custom_ops/layers/_deformable_conv2d_ops.so: undefined symbol: _ZN10tensorflow7strings8internal9CatPiecesB5cxx11ESt16initializer_listIN4absl14lts_2020_09_2311string_viewEE

What am I doing wrong? Some help would be much appreciated :)

Some additional info: I'm using

Ubuntu 18.04
Python 3.6.9
tensorflow 2.5.0
bazel 3.7.2
CUDA 11.2
CUDNN 8.2

and the warnings are several

/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/thrust/detail/config/cpp_dialect.h:118:13: warning: Thrust requires C++14. Please pass -std=c++14 to your compiler. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);

and a

WARNING: home/peter/.cache/bazel/_bazel_peter/c6b879c04b9dc802e9234eb6fb8e9fc9/external/local_config_tf/BUILD:12633:8: target 'libtensorflow_framework.so.2' is both a rule and a file; please choose another name for the rule

@axeldavy
Copy link

Hey, I've tried this deformable convolution implementation in my project, and it seems to work as expected.

However the performance hit compared to a normal convolution is huge.
Indeed the implementation doesn't support the NHWC ordering, which means I have to swap before and after the convolution, and it addition it doesn't support float16 or mixed_float16, and thus doesn't use fp16 tensorcores.

Would it be possible to support NHWC ordering and mixed_fp16/fp16 ?

An additional thought: unless I'm mistaken, it seems that the modulated convolution could be implemented as first a layer that grabs the values at the predicted positions (with bilinear interpolation), and applies the mask, and concatenates the values (for example for a 3x3 convolution the number of channels of the output would be multiplied by 9), then a standard 1x1 convolution.
Possibly it would be easier to implement and maintain that way (only the first layer needs to be implemented).

@anshkumar
Copy link

@Licht-T I tried compiling it using cuda 11.6, cudnn 8.3 but getting following error on usage:

2022-02-23 16:37:03.133656: F tensorflow_addons/custom_ops/layers/cc/kernels/deformable_conv2d_op_gpu.cu.cc:292] Non-OK-status: GpuLaunchKernel(DeformableIm2ColKernel<float>, config.block_count, config.thread_per_block, 0, device.stream(), b, num_kernels, p, input_eigen_tensor, offset_eigen_tensor, mask_eigen_tensor, column_buffer_eigen_tensor) status: Internal: too many resources requested for launch

@anshkumar
Copy link

@Licht-T I tried compiling it using cuda 11.6, cudnn 8.3 but getting following error on usage:

2022-02-23 16:37:03.133656: F tensorflow_addons/custom_ops/layers/cc/kernels/deformable_conv2d_op_gpu.cu.cc:292] Non-OK-status: GpuLaunchKernel(DeformableIm2ColKernel<float>, config.block_count, config.thread_per_block, 0, device.stream(), b, num_kernels, p, input_eigen_tensor, offset_eigen_tensor, mask_eigen_tensor, column_buffer_eigen_tensor) status: Internal: too many resources requested for launch

Changing every occurrence of config.thread_per_block with config.thread_per_block/2 fixed this.

@bhack
Copy link
Contributor

bhack commented May 10, 2022

Thanks for the contribution but I don't think we have here the bandwidth here to add another custom-ops in the repo.

Please try to contribute a python only version in Keras-cv https://github.com/keras-team/keras-cv

@bhack bhack closed this May 10, 2022
@SimonBiggs
Copy link

SimonBiggs commented Dec 14, 2022

Hi @bhack,

If I was to give this PR another try but attempt to split it down into smaller more bite sized PRs would you/others be willing to review and merge?

Another caveat, is I would want to have my initial "hello world" implementation be a 3D deformable convolution as opposed to 2D within this PR. It would be my intent to leave a 2D implementation to someone else who wants to modify/extend my work (where I would be happy to support and review in that endeavour, but would be leaving someone else to spearhead it).

Although I can't prove that I'll stick around after the PR, I do maintain a package in my field which pulls in outside contributions, and by feeling some of the pain of having code-owners disappear, I hope my word has enough weight when I say I want to stick around and help when there is future work/issues/PRs going on around the code that I contribute.

Cheers,
Simon

@SimonBiggs
Copy link

An additional thought: unless I'm mistaken, it seems that the modulated convolution could be implemented as first a layer that grabs the values at the predicted positions (with bilinear interpolation), and applies the mask, and concatenates the values (for example for a 3x3 convolution the number of channels of the output would be multiplied by 9), then a standard 1x1 convolution.
Possibly it would be easier to implement and maintain that way (only the first layer needs to be implemented).

@axeldavy, if I was to take this on, might you want to help me?

@bhack
Copy link
Contributor

bhack commented Dec 14, 2022

@SimonBiggs Can you open a ticket to propose this in Keras-CV https://github.com/keras-team/keras-cv/

As many of the computer vision related contributions are converging there.

@SimonBiggs
Copy link

I notice the following comment re custom ops:

https://github.com/keras-team/keras-cv/blob/master/.github/CONTRIBUTING.md#contributing-custom-ops

Would that make this contribution the first custom op in Keras-CV that pulls in CUDA code?

I would aim to use the approach that @axeldavy mentioned, so the custom op section would hopefully have a reasonably smaller surface area.

@bhack
Copy link
Contributor

bhack commented Dec 14, 2022

We could start with a ticket there to discuss the topic and how to cover this convolution before creating a PR.

@SimonBiggs
Copy link

Thanks @bhack,

New issue now over at keras-team/keras-cv#1140.

Cheers :),
Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deformable Convolutional Op Support
10 participants