Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR proposal: 3D deformable convolution #1140

Open
SimonBiggs opened this issue Dec 14, 2022 · 30 comments
Open

PR proposal: 3D deformable convolution #1140

SimonBiggs opened this issue Dec 14, 2022 · 30 comments

Comments

@SimonBiggs
Copy link

SimonBiggs commented Dec 14, 2022

I am proposing to be able to include a 3D deformable convolution within keras-cv

Background

A recent paper has used deformable convolutions as a key building block in order to have a CNN be state of the art:

Also, in the past they have shown to be able to make standard convolutional networks be more robust to multi-scale problems:
https://github.com/kastnerkyle/deform-conv

There was an implementation PR for these over at tensorflow addons:
tensorflow/addons#2196 (comment)

And there was an idea for how to make that particular PR smaller:
tensorflow/addons#2196 (comment)

There is also an efficient Apache-2.0 tensor-rt implementation of a 2D variant for pytorch over at:
https://github.com/open-mmlab/mmcv/blob/46eb9ec5d07ea344ed43056d007a7eb71dc3ee98/mmcv/ops/csrc/tensorrt/plugins/trt_modulated_deform_conv.cpp
https://github.com/open-mmlab/mmcv/blob/46eb9ec5d07ea344ed43056d007a7eb71dc3ee98/mmcv/ops/csrc/tensorrt/plugins/trt_modulated_deform_conv_kernel.cu

In my particular task I am undergoing segmentation of 3D medical imaging CT and MRI scans in order to help support radiotherapy cancer treatments. These scans are often "multi-scale" with differing voxel (a 3D pixel) sizes and slice thicknesses. A standard approach is to resample all data to an isotropic voxel size. But combined with a few other components I hypothesise that by utilising a 3D deformable convolution instead that this would be part of the story of leading to an efficient new state of the art approach.

Proposal

That we undergo an initial implementation of a 3D deformable convolution within keras-cv. After our work someone else can extend it further to also work on the 2D case. I would be willing to spearhead the task. But first, let's discuss an appropriate and acceptable approach here.

@SimonBiggs SimonBiggs changed the title PR discussion: 3D deformable convolution PR proposal: 3D deformable convolution Dec 14, 2022
@bhack
Copy link
Contributor

bhack commented Dec 14, 2022

/cc @innat

@SimonBiggs
Copy link
Author

@innat and @bhack, what do you believe our next steps should be?

@bhack
Copy link
Contributor

bhack commented Dec 15, 2022

/cc @ianstenbit @tanzhenyu

@axeldavy
Copy link

As a following note for my comment on
tensorflow/addons#2196 (comment)

I've written my own tensorflow addon that does exactly the proposed method. A lot of work remains because I didn't write test code and only supported the subset of parameters I needed. It works for me and gives a speed boost over the original because I don't need the NHWC NCHW transposition, but it's not as fast as it should. There likely remains some investigation to be sure everything is as optimized as it should be (maybe there is some flag to add somewhere or something. I'm very confident in my OpenCL writing, but I don't do CUDA usually. I'm open to share my code if it helps.

@bhack
Copy link
Contributor

bhack commented Dec 15, 2022

There was an old compositional implementation of a 2d deformable conv:
https://tensorlayer.readthedocs.io/en/latest/_modules/tensorlayer/layers/convolution/deformable_conv.html

But I have not benchmarked it under XLA or the new bridge to understand how it is efficiently lowered.

@SimonBiggs
Copy link
Author

SimonBiggs commented Dec 15, 2022

I'm open to share my code if it helps.

@axeldavy, that would be amazing if you could. Would you be open to putting it up on a repo under an Apache-2.0 license?

There was an old compositional implementation of a 2d deformable conv

I believe that implementation is a copy of the following:

https://github.com/kastnerkyle/deform-conv

@bhack
Copy link
Contributor

bhack commented Dec 15, 2022

I believe that implementation is a copy of the following:

Yes probably it is derived but with some refactoring.

It could be interesting if you can try to jit_compile a pure TF version of that one in a Colab just to understand the performance and the emitted code.

@SimonBiggs
Copy link
Author

Also, just a heads up, in case it's an issue, I can't actually interact with the following discussion thread:

openxla/xla#17 (comment)

@SimonBiggs
Copy link
Author

SimonBiggs commented Dec 15, 2022

It could be interesting if you can try to jit_compile a pure TF version of that one in a Colab just to understand the performance and the emitted code.

Might you be interested @axeldavy in comparing the difference between your implementation and a tf.function(jit_compile=True) version of the following code?

https://github.com/kastnerkyle/deform-conv --> https://tensorlayer.readthedocs.io/en/latest/_modules/tensorlayer/layers/convolution/deformable_conv.html

Edit: see #1140 (comment)

@bhack
Copy link
Contributor

bhack commented Dec 15, 2022

I still see np code there that instead in the tensorlayer flavour was removed.

@bhack
Copy link
Contributor

bhack commented Dec 15, 2022

We had a V2 version at https://github.com/smallsunsun1/deformable_conv_v2 but I've not verfied the impl.

@SimonBiggs
Copy link
Author

We had a V2 version at https://github.com/smallsunsun1/deformable_conv_v2 but I've not verfied the impl.

Unfortunately no license file...

@bhack
Copy link
Contributor

bhack commented Dec 16, 2022

Yes but it is just for benchmarking so I am not worried about the license while it is not used in a PR:

https://github.com/RuaHU/keras_DCNv2

@ianstenbit
Copy link
Contributor

I am open to including a deformable conv layer in KerasCV, probably as an experimental API for now. We don't have bandwidth to build this in-house right now but are open to a contribution @SimonBiggs.

I do wonder if deformable contribution in general is something that should eventually be upstreamed to core Keras, given that it may also have value in time-domain convolutions (at least I would guess it might -- haven't seen any papers about this).

@tanzhenyu wdyt?

@SimonBiggs
Copy link
Author

given that it may also have value in time-domain convolutions (at least I would guess it might -- haven't seen any papers about this).

Here's an example paper for time-domain with 3D deformable convolutions:
https://arxiv.org/abs/2004.02803

@tanzhenyu
Copy link
Contributor

@axeldavy @SimonBiggs @bhack Yeah I'm interested in knowing whether this can be expressed by native TF ops. If it works and it's really just a matter of kernel performance, maybe that's something the XLA can help.
Otherwise, hosting a CUDA kernel seems to be contradicting to our core value, which is that it should on GPU/TPU without code change.

@axeldavy
Copy link

Here is the code I'm using:
https://github.com/axeldavy/tensorflow_addons/tree/add-deformable-interpolation

The CUDA code can be seen here:
https://github.com/axeldavy/tensorflow_addons/blob/add-deformable-interpolation/tensorflow_addons/custom_ops/layers/cc/kernels/deformable_interpolation_op_gpu.cu.cc

I don't know enough of XLA to know if it can be efficiently expressed with it.

I haven't benchmarked the code either or checked the generated asm is ok. Maybe changing the border handling would help performance. For example in openCL it is typically much better to always sample (using adress clamping for when we are outside), and replace the read content if we were outside, rather than reading conditionally. This is due to the generated asm being of better quality (the memory access is generated more in advance, thus it reduces latency). Similarly it would be better to know at compilation time the number of channels, to enable loop unrolling. I don't know if CUDA recompiles internally to optimize that.

The difference of my approach against DCNv2 usual implementations is that the number of position sampled is arbitrary (and doesn't have initial offsets. If you want to initialize to 3x3 instead of all positions in 0, you need to have it in the input sampling offset). The cost is proportional to the number of positions.

Here is an example how to use it:

# Standard DCNV2 would be 9 positions and 1 group (all features have the same offsets).
num_positions = 4
num_groups = 1
 # Compute offsets for the positions - init with zeros (not like standard DCN)
offsets = Conv2D(2 * num_positions * num_groups, 1,
                             padding='same', use_bias=True,
                             kernel_initializer='zeros', kernel_regularizer=l2(5e-3), bias_regularizer=l2(5e-3),
                             name='conv_offsets_1' )(x)
# Compute DCNv2 masks
mask = Conv2D(num_positions * num_groups, 1,
                          padding='same', use_bias=True, kernel_initializer='zeros',
                          kernel_regularizer=l2(5e-3), bias_regularizer=l2(5e-3),
                          name='conv_mask_1', activation='sigmoid')(x)
# Retrieve features at the different positions and stack them
x = DeformableInterpolation(offset_groups=num_groups, num_positions=num_positions,
                                        use_mask=True, name="dcn_interp_1")([x, offsets, mask])
# Apply the convolution
x = Conv2D(num_filters, 1, padding='same', use_bias=True, kernel_initializer='he_normal',
                      kernel_regularizer=l2(5e-3), name="dcn_conv_1")(x)

One other advantage I see to this approach is that it seems easy to implement DCNv3 with it (change sigmoid for softmax, use more than 1 group, replace the Conv2D call (with two convolutions if I'm not mistaken. One that use groups and reduces the number of features, and one normal one). Also the code works with float16.

@SimonBiggs
Copy link
Author

Thanks @axeldavy, it's Friday afternoon here in Aus, so I won't get back to this until next week. But thank you. Thank you very much.

@bhack
Copy link
Contributor

bhack commented Dec 16, 2022

maybe that's something the XLA can help.

If XLA cannot speed up the operator enough or it cannot compile we could always check with the upstream why it cannot be efficiently lowered:

openxla/xla#17 (comment)

@axeldavy
Copy link

It's not just about training with tensorflow though. One issue with deformable convolution is getting it to work with inference solutions (tensorrt, etc). Ideally there should be an onnx operation for it, and the corresponding implementations in tensorflow, tensorrt, etc.

@bhack
Copy link
Contributor

bhack commented Dec 16, 2022

Xla has custom call if the high level op is not lowered efficiently or the the specific backend is not supported:

https://www.tensorflow.org/xla/custom_call

@SimonBiggs
Copy link
Author

SimonBiggs commented Dec 20, 2022

So, I might have messed something up with the benchmark. I am just using timeit + running the function once for preheating. And, gee wizz, it looks like XLA absolutely wins the day:

CPU timings:

  • mvcc (pytorch + cpp): 8 s
  • tensorlayer (tensorflow + jit): 1 s

GPU timings:

  • mvcc (pytorch + cuda): 16 ms
  • tensorlayer (tensorflow + jit): 6 ms

Here is the colab:
https://colab.research.google.com/drive/1TNIUJN4W95V81VkWq0n2TksKisnfbZTR?usp=sharing

Keen for feedback it there's anything I have done wrong. Otherwise, if it's okay with the team I'll get to work writing a 3D Modulated Deformable convolution in pure python using native TF ops based off of tensorlayer's implementation and hopefully be able to provide it as a PR.

Thank you @bhack for helping me through this!

@tanzhenyu
Copy link
Contributor

So, I might have messed something up with the benchmark. I am just using timeit + running the function once for preheating. And, gee wizz, it looks like XLA absolutely wins the day:

CPU timings:

  • mvcc (pytorch + cuda): 8 s
  • tensorlayer (tensorflow + jit): 1 s

GPU timings:

  • mvcc (pytorch + cuda): 16 ms
  • tensorlayer (tensorflow + jit): 6 ms

Here is the colab: https://colab.research.google.com/drive/1TNIUJN4W95V81VkWq0n2TksKisnfbZTR?usp=sharing

Keen for feedback it there's anything I have done wrong. Otherwise, if it's okay with the team I'll get to work writing a 3D Modulated Deformable convolution in pure python using native TF ops based off of tensorlayer's implementation and hopefully be able to provide it as a PR.

Thank you @bhack for helping me through this!

Sounds good. Feel free to add a deformable conv3d in this library

@SimonBiggs
Copy link
Author

SimonBiggs commented Dec 20, 2022

Also, @axeldavy, I tried to build and install your fork:
https://colab.research.google.com/drive/1TNIUJN4W95V81VkWq0n2TksKisnfbZTR?usp=sharing#scrollTo=EdjOCYULwXv1

But I didn't have success. Keen to know what I did wrong there. It would be nice to be able to also benchmark your implementation.

@axeldavy
Copy link

Your compilation failure doesn't seem related to my changes. I would suggest to just apply my last commit to a branch that works for you.

@SimonBiggs
Copy link
Author

I'm going on leave for the next few weeks so won't be in a position to look at this again until the 3rd quarter of Jan. If someone else is keen to do some work on it I'm completely happy for that, otherwise I'll get back to it next year.

@axeldavy
Copy link

axeldavy commented Dec 29, 2022

I hadn't noticed, but onnx has recently received a new operator GridSample that is relevant to the topic.
https://github.com/onnx/onnx/blob/main/docs/Operators.md

I think it is relevant, because it is better if the proposed implementation can export to a good enough onnx.
Using an operator like GridSample (available with pytorch but not yet with tensorflow) removes the need to do the bilinear interpolation manually like in
https://tensorlayer.readthedocs.io/en/latest/_modules/tensorlayer/layers/convolution/deformable_conv.html

EDIT: If I'm not clear enough in my message, I suggest adding tensorflow and XLA support for a function similar to GridSample, which could be used to implement deformable convolution, but is also useful for other applications, such as optical flow, registration, etc.

@bhack
Copy link
Contributor

bhack commented Dec 29, 2022

@SimonBiggs
Copy link
Author

So after taking three weeks of leave, it seems my plate is now exceptionally full. I'm going to have to put this in my backlog for the time being.

Copy link

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants