pad_mode and pad_value for GaussianBlur #6829

themurtazanazir · 2022-10-24T17:02:57Z

Fix for #6683.

facebook-github-bot · 2022-10-24T17:03:54Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

datumbox · 2022-10-24T17:11:13Z

@themurtazanazir Thanks for the contribution! We are currently revamping the Transforms API and try to reduce the number of new features/params introduced. Let me get back to you concerning whether we can merge right now or wait for after the API of V2 is finalized.

Pinging @pmeier and @vfdev-5 to get their thoughts.

facebook-github-bot · 2022-10-24T18:57:10Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

facebook-github-bot · 2022-10-24T18:57:14Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

themurtazanazir · 2022-10-25T03:06:16Z

@datumbox Sure. Is there any tracking on this revamping, the timeline or the changes? I'd like to be updated. Thanks.

datumbox · 2022-10-25T07:59:38Z

@themurtazanazir We track the work on the project on this dashboard. Moreover we are about to publish a blogpost about the new API with all the details and examples. Some high-level examples can be seen on #6753.

themurtazanazir · 2022-10-25T08:27:22Z

@datumbox Thanks.

datumbox

@themurtazanazir thanks for the updates.

This leaves the V2 of the API disconnected from the implementation on V1. We would need to add the same parameters on the V2 (code pointers here and here).

@vfdev-5 Any concerns/feedback about the nature of this extension?

vfdev-5 · 2022-11-18T12:08:46Z

@themurtazanazir thanks for working on this PR. I was trying to understand better the context of the linked issue and I'm not sure that we need to update the API for cases when input image is smaller than gaussian blur kernel size. Can we just fix gaussian blur implementation for such cases by explicitly using constant mode if it makes sense ?

themurtazanazir · 2022-11-18T12:57:30Z

@vfdev-5 yes, just to fix this issue, we can dynamically choose what pad_mode to use. The pro is that there is no change in API parameters. However, this new approach adds the flexibility of using even other types of pad_modes, i.e "replicate", "circular" and this is backward compatible (throw errors where it was throwing errors earlier). Furthermore, it would be complicated to explain the dynamic padding modes in in docs.

vfdev-5 · 2022-11-18T15:53:45Z

@themurtazanazir I think such flexibility can be achived in a composed way instead of adding new args to existing transformation. We can explicitly pad the image with Pad before calling GaussianBlur. On the other hand the use-cases when image is smaller than kernel size seems rather rare IMO.

Furthermore, it would be complicated to explain the dynamic padding modes in in docs.

In this case we can just raise an error with more comprehensible message and suggest how to overcome it.

themurtazanazir · 2022-11-18T16:51:50Z

... such flexibility can be achived in a composed way instead of adding new args to existing transformation. We can explicitly pad the image with Pad before calling GaussianBlur.

I don't think it would be exactly the same. GaussianBlur will pad in "reflect" (or "constant", if we go dynamic!) mode regardless.

On the other hand the use-cases when image is smaller than kernel size seems rather rare IMO.

I cannot speak for that. But for the context, this error will occur even when just one dimension of the image is smaller than the kernel.

e.g:

from PIL import Image
import numpy as np
from torchvision import transforms 
im  = Image.new("L", (78, 3), 255)
transforms.GaussianBlur(7, sigma=(0.1, 4.0))(im)

This will also throw an error.

A common usecase would be generating synthetic images for OCR related tasks.

In this case we can just raise an error with more comprehensible message and suggest how to overcome it.

Here the suggestion would be to use Pad before GaussianBlur?

vfdev-5 · 2022-11-18T19:36:58Z

A common usecase would be generating synthetic images for OCR related tasks.

So, in OCR task you can apply a blur larger than an image dimension ?

The main question here is what would be a right way to do gaussian blur in those cases. What do you think ?
In general case I'd assume that padding the input image with reflect mode to compensate boundary effects is acceptable for gaussian blur. In case of kernel size larger than one input image dimension, I think it is fair to let user pad this dimension before gaussian blur:

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(7, sigma=(0.1, 4.0))
])(im)

themurtazanazir · 2022-11-19T05:46:13Z

So, in OCR task you can apply a blur larger than an image dimension ?

Yes it can be. Imagine a synthetic image generation pipeline using a random font and a random font size (from a range). It is possible an image height can be smaller than kernel size. Although not so common, but defintely likely.

The main question here is what would be a right way to do gaussian blur in those cases. What do you think ?

To answer this, I'm going to go on a bit of an over-explanation, so bear with me.

As far as I understand, the GaussianBlur is a wrapper of funtional_tensor.gaussian_blur (hereafter just refernced as gaussian_blur ) with a random sigma chosen. The gaussian_blur accepts multiple padding types because it pads the image to keep the image size same.

Now if we go on to have user pad the image first using Pad and use GaussianBlur on top, in effect it will change the image size.

e.g, in the code:

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(7, sigma=(0.1, 4.0))
])(im)

print(im.size, out.size)

The output is (78, 3) (78, 7) and keeping the size same might be important for usecases, (which is why we must have added padding in the gaussuan_blur itself).

Secondly in this solution, the user again has to dynamically choose the padding given the difference in image size and kernel size.

e.g:

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(21, sigma=(0.1, 4.0))
])(im)

will again throw the error. Here the user needs to dynamically calculate the padding.

Third is, why are we restricting the function paramters of gaussian_blur in GaussianBlur? I mean "reflect" padding is considered "better" but that should be kept as the default parameter for GaussianBlur but not the only option, IMO.

For the first two problems, we can either let the user decide by adding the parameters pad_mode and pad_value (as is done in this PR) or the way you suggested to dynamically check if the any of the image dimension is smaller, then switch to any other mode than "reflect".

If we consider all three issues, then giving user the choice through parameters seems the only way at the moment.

Sorry for the long note 😇 .

vfdev-5 · 2022-11-21T09:42:11Z

@themurtazanazir thanks for exposing your thoughts, I'm still not convinced about new args.

I checked opencv docs and they expose borderType arg for an equivalent purpose: https://docs.opencv.org/4.6.0/d4/d86/group__imgproc__filter.html#gaabe8c836e97159a9193fb0b11ac52cf1
However, their replicate mode seems like to use max possible kernel size instead of provided by user:

import numpy as np
import cv2
i = np.arange(4 * 4, dtype="uint8").reshape(4, 4)

cv2.GaussianBlur(i, ksize=(7, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)

# array([[ 5,  6,  6,  6],
       [ 6,  7,  7,  7],
       [ 8,  8,  8,  9],
       [ 9,  9,  9, 10]], dtype=uint8)

cv2.GaussianBlur(i, ksize=(9, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5,  6,  6,  6],
       [ 6,  7,  7,  7],
       [ 8,  8,  8,  9],
       [ 9,  9,  9, 10]], dtype=uint8)

cv2.GaussianBlur(i, ksize=(11, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5,  6,  6,  6],
       [ 6,  7,  7,  7],
       [ 8,  8,  8,  9],
       [ 9,  9,  9, 10]], dtype=uint8)

Maybe, it worth to check their code to see what they exactly doing...

On the other hand, PIL GaussianBlur does not have such arg: https://pillow.readthedocs.io/en/stable/reference/ImageFilter.html#PIL.ImageFilter.GaussianBlur

So, we have the several options without introducing new args:

do similar thing as opencv (maybe clip kernel size to max possible value)
raise an error if kernel size > image size and suggest to pad and crop

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(7, sigma=(0.1, 4.0)),
    transforms.Pad([0, -2], fill=127),
])(im)

print(im.size, out.size)

replace padding mode with constant if kernel size > image size

themurtazanazir and others added 2 commits October 24, 2022 22:26

pad_mode and pad_value for GaussianBlur

fac0be4

Merge branch 'main' into gaussian-blur

1565efe

themurtazanazir added 2 commits October 24, 2022 22:35

Merge branch 'main' into gaussian-blur

4a56e1c

Merge branch 'pytorch:main' into gaussian-blur

d4dd7b5

facebook-github-bot added the cla signed label Oct 24, 2022

themurtazanazir added 2 commits November 1, 2022 08:46

Merge branch 'main' into gaussian-blur

276617e

Merge branch 'main' into gaussian-blur

d7ad4d6

datumbox reviewed Nov 18, 2022

View reviewed changes

datumbox requested a review from vfdev-5 November 18, 2022 11:55

Merge branch 'main' into gaussian-blur

3ee5aa7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pad_mode and pad_value for GaussianBlur #6829

pad_mode and pad_value for GaussianBlur #6829

themurtazanazir commented Oct 24, 2022

facebook-github-bot commented Oct 24, 2022

datumbox commented Oct 24, 2022

facebook-github-bot commented Oct 24, 2022

facebook-github-bot commented Oct 24, 2022

themurtazanazir commented Oct 25, 2022

datumbox commented Oct 25, 2022

themurtazanazir commented Oct 25, 2022

datumbox left a comment

vfdev-5 commented Nov 18, 2022

themurtazanazir commented Nov 18, 2022

vfdev-5 commented Nov 18, 2022

themurtazanazir commented Nov 18, 2022 •

edited

Loading

vfdev-5 commented Nov 18, 2022

themurtazanazir commented Nov 19, 2022

vfdev-5 commented Nov 21, 2022

pad_mode and pad_value for GaussianBlur #6829

Are you sure you want to change the base?

pad_mode and pad_value for GaussianBlur #6829

Conversation

themurtazanazir commented Oct 24, 2022

facebook-github-bot commented Oct 24, 2022

Action Required

Process

datumbox commented Oct 24, 2022

facebook-github-bot commented Oct 24, 2022

facebook-github-bot commented Oct 24, 2022

themurtazanazir commented Oct 25, 2022

datumbox commented Oct 25, 2022

themurtazanazir commented Oct 25, 2022

datumbox left a comment

Choose a reason for hiding this comment

vfdev-5 commented Nov 18, 2022

themurtazanazir commented Nov 18, 2022

vfdev-5 commented Nov 18, 2022

themurtazanazir commented Nov 18, 2022 • edited Loading

vfdev-5 commented Nov 18, 2022

themurtazanazir commented Nov 19, 2022

vfdev-5 commented Nov 21, 2022

themurtazanazir commented Nov 18, 2022 •

edited

Loading