Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The document of torchvision.ops.deform_conv2d is not clear #3673

Open
Zhaoyi-Yan opened this issue Apr 15, 2021 · 12 comments
Open

The document of torchvision.ops.deform_conv2d is not clear #3673

Zhaoyi-Yan opened this issue Apr 15, 2021 · 12 comments
Labels

Comments

@Zhaoyi-Yan
Copy link

📚 Documentation

From the documentation, I cannot get the exact meaning of 18(ie, 233) channels of the offset in a deformable convolution?

I want to visualize the offset of the deformable convolution with kernel size 3*3.
So It’s essential for me to know what’s the exact meaning of these channels.

I write down something possible here:

upper-left: ul
upper-right: ur
bottom-left: bl
bottom-right: br
up: u
bottom: b
right: r
left: l
center: c

possible offset layout (maybe not correct):
delta_ul_x, delta_ul_y,   delta_u_x, delta_u_y,     delta_ur_x, delta_ur_y;
delta_l_x, delta_l_y,       delta_c_x, delta_c_y,      delta_r_x, delta_r_y;
delta_bl_x, delta_bl_y,   delta_b_x, delta_b_y,     delta_br_x, delta_br_y;
@voldemortX
Copy link
Contributor

voldemortX commented Apr 21, 2021

@Zhaoyi-Yan Perhaps the CPU source code can lend you some insights. I'd say your guess seems reasonable, but I don't remember much detail of DCN so I wouldn't be sure.

@Zhaoyi-Yan
Copy link
Author

After reading the source code you refer to, it seems reasonable. However, it would be better to get a detailed note on the offset for the users.

@voldemortX
Copy link
Contributor

voldemortX commented Apr 27, 2021

Maybe some comments should be added for this. What do you think? @NicolasHug

@NicolasHug
Copy link
Member

sure, any PR to improve the docs would be very welcome!

@voldemortX
Copy link
Contributor

I'm not an expert on DCN right now... So maybe you'd like to send a PR for this? @Zhaoyi-Yan

@Zhaoyi-Yan
Copy link
Author

I am not too...

@voldemortX
Copy link
Contributor

voldemortX commented Apr 27, 2021

@NicolasHug @Zhaoyi-Yan I'll try send a PR to clarify the doc after some paper re-reading, if no one more familiar with DCN turns up.

@voldemortX
Copy link
Contributor

@Zhaoyi-Yan I've sent a PR for this. I now believe your initial guess is correct, if you consider the height direction as x, width direction as y.

@dariofuoli
Copy link

dariofuoli commented Jun 2, 2021

It would be very important to also know the order of the elements in (from docs):

(Tensor[batch_size, 2 * offset_groups * kernel_height * kernel_width, (offset) – out_height, out_width]): offsets to be applied for each position in the convolution kernel.

I.e., what is the arrangement of 2 * offset_groups * kernel_height * kernel_width, is it in this particular order? Considering the comments here, the following would be more likely:
offset_groups * kernel_height * kernel_width * 2 with the kernel dimensions from left to right, top to bottom.

I think it could be made a lot clearer with passing a tensor instead of a flattened array: (offset_group x kernel_height x kernel_width x 2)

@voldemortX
Copy link
Contributor

It would be very important to also know the order of the elements in (from docs):

(Tensor[batch_size, 2 * offset_groups * kernel_height * kernel_width, (offset) – out_height, out_width]): offsets to be applied for each position in the convolution kernel.

I.e., what is the arrangement of 2 * offset_groups * kernel_height * kernel_width, is it in this particular order? Considering the comments here, the following would be more likely:
offset_groups * kernel_height * kernel_width * 2 with the kernel dimensions from left to right, top to bottom.

It is very confusing indeed. You could checkout this ongoing PR for some clarification (I tried but, the explanation there is still not very clear...)

I think it could be made a lot clearer with passing a tensor instead of a flattened array: (offset_group x kernel_height x kernel_width x 2)

I think that could introduce a BC-Break of some sort? Personally, I think maybe if deformable conv could be implemented as a pytorch layer, things would be much easier...

@dariofuoli
Copy link

I think that could introduce a BC-Break of some sort? Personally, I think maybe if deformable conv could be implemented as a pytorch layer, things would be much easier...

I am not a developer, but I think this might be handled with a fixed internal flatten operation, which can handle both inputs?

Personally, I think stating the exact order of elements encoded in the dimension "2 * offset_groups * kernel_height * kernel_width" in the docs would be sufficient, I like the functional approach of the current version.

Assuming the order: T in groups x kernel_height x kernel_width x [offset_h, offset_w] then stating that the "flattened tensor" to pass to the function will be: [T[0,0,0,0], T[0,0,0,1], T[0,0,1,0], T[0,0,1,1],...]

If this assumption is correct, for clarity, the docs should state:
(Tensor[batch_size, offset_groups * kernel_height * kernel_width * 2, (offset) – out_height, out_width]): offsets to be applied for each position in the convolution kernel.

@lartpang
Copy link

Maybe this demo will help us understand the role of offset.

import torch
from torchvision.ops import deform_conv2d

h = w = 3

# batch_size, num_channels, out_height, out_width
x = torch.arange(h * w * 3, dtype=torch.float32).reshape(1, 3, h, w)

# to show the effect of offset more intuitively, only the case of kh=kw=1 is considered here
offset = torch.FloatTensor(
    [  # create our predefined offset with offset_groups = 3
        0,
        -1,  # sample the left pixel of the centroid pixel
        0,
        1,  # sample the right pixel of the centroid pixel
        -1,
        0,  # sample the top pixel of the centroid pixel
    ]  # here, we divide the input channels into offset_groups groups with different offsets.
).reshape(1, 2 * 3 * 1 * 1, 1, 1)
# here we use the same offset for each local neighborhood in the single channel
# so we repeat the offset to the whole space: batch_size, 2 * offset_groups * kh * kw, out_height, out_width
offset = offset.repeat(1, 1, h, w)

weight = torch.FloatTensor(
    [
        [1, 0, 0],  # only extract the first channel of the input tensor
        [0, 1, 0],  # only extract the second channel of the input tensor
        [1, 1, 0],  # add the first and the second channels of the input tensor
        [0, 0, 1],  # only extract the third channel of the input tensor
        [0, 1, 0],  # only extract the second channel of the input tensor
    ]
).reshape(5, 3, 1, 1)
deconv_shift = deform_conv2d(x, offset=offset, weight=weight)
print(deconv_shift)

"""
tensor([[[[ 0.,  0.,  1.],  # offset=(0, -1) the first channel of the input tensor
          [ 0.,  3.,  4.],  # output hw indices (1, 2) => (1, 2-1) => input indices (1, 1)
          [ 0.,  6.,  7.]], # output hw indices (2, 1) => (2, 1-1) => input indices (2, 0)

         [[10., 11.,  0.],  # offset=(0, 1) the second channel of the input tensor
          [13., 14.,  0.],  # output hw indices (1, 1) => (1, 1+1) => input indices (1, 2)
          [16., 17.,  0.]], # output hw indices (2, 0) => (2, 0+1) => input indices (2, 1)

         [[10., 11.,  1.],  # offset=[(0, -1), (0, 1)], accumulate the first and second channels after being sampled with an offset.
          [13., 17.,  4.],
          [16., 23.,  7.]],

         [[ 0.,  0.,  0.],  # offset=(-1, 0) the third channel of the input tensor
          [18., 19., 20.],  # output hw indices (1, 1) => (1-1, 1) => input indices (0, 1)
          [21., 22., 23.]], # output hw indices (2, 2) => (2-1, 2) => input indices (1, 2)

         [[10., 11.,  0.],  # offset=(0, 1) the second channel of the input tensor
          [13., 14.,  0.],  # output hw indices (1, 1) => (1, 1+1) => input indices (1, 2)
          [16., 17.,  0.]]]])  # output hw indices (2, 0) => (2, 0+1) => input indices (2, 1)
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants