Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pad_mode and pad_value for GaussianBlur #6829

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

themurtazanazir
Copy link

Fix for #6683.

@facebook-github-bot
Copy link

Hi @themurtazanazir!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@datumbox
Copy link
Contributor

@themurtazanazir Thanks for the contribution! We are currently revamping the Transforms API and try to reduce the number of new features/params introduced. Let me get back to you concerning whether we can merge right now or wait for after the API of V2 is finalized.

Pinging @pmeier and @vfdev-5 to get their thoughts.

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@themurtazanazir
Copy link
Author

@datumbox Sure. Is there any tracking on this revamping, the timeline or the changes? I'd like to be updated. Thanks.

@datumbox
Copy link
Contributor

@themurtazanazir We track the work on the project on this dashboard. Moreover we are about to publish a blogpost about the new API with all the details and examples. Some high-level examples can be seen on #6753.

@themurtazanazir
Copy link
Author

@datumbox Thanks.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themurtazanazir thanks for the updates.

This leaves the V2 of the API disconnected from the implementation on V1. We would need to add the same parameters on the V2 (code pointers here and here).

@vfdev-5 Any concerns/feedback about the nature of this extension?

@datumbox datumbox requested a review from vfdev-5 November 18, 2022 11:55
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 18, 2022

@themurtazanazir thanks for working on this PR. I was trying to understand better the context of the linked issue and I'm not sure that we need to update the API for cases when input image is smaller than gaussian blur kernel size. Can we just fix gaussian blur implementation for such cases by explicitly using constant mode if it makes sense ?

@themurtazanazir
Copy link
Author

@vfdev-5 yes, just to fix this issue, we can dynamically choose what pad_mode to use. The pro is that there is no change in API parameters. However, this new approach adds the flexibility of using even other types of pad_modes, i.e "replicate", "circular" and this is backward compatible (throw errors where it was throwing errors earlier). Furthermore, it would be complicated to explain the dynamic padding modes in in docs.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 18, 2022

@themurtazanazir I think such flexibility can be achived in a composed way instead of adding new args to existing transformation. We can explicitly pad the image with Pad before calling GaussianBlur. On the other hand the use-cases when image is smaller than kernel size seems rather rare IMO.

Furthermore, it would be complicated to explain the dynamic padding modes in in docs.

In this case we can just raise an error with more comprehensible message and suggest how to overcome it.

@themurtazanazir
Copy link
Author

themurtazanazir commented Nov 18, 2022

... such flexibility can be achived in a composed way instead of adding new args to existing transformation. We can explicitly pad the image with Pad before calling GaussianBlur.

I don't think it would be exactly the same. GaussianBlur will pad in "reflect" (or "constant", if we go dynamic!) mode regardless.

On the other hand the use-cases when image is smaller than kernel size seems rather rare IMO.

I cannot speak for that. But for the context, this error will occur even when just one dimension of the image is smaller than the kernel.

e.g:

from PIL import Image
import numpy as np
from torchvision import transforms 
im  = Image.new("L", (78, 3), 255)
transforms.GaussianBlur(7, sigma=(0.1, 4.0))(im)

This will also throw an error.

A common usecase would be generating synthetic images for OCR related tasks.

In this case we can just raise an error with more comprehensible message and suggest how to overcome it.

Here the suggestion would be to use Pad before GaussianBlur?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 18, 2022

A common usecase would be generating synthetic images for OCR related tasks.

So, in OCR task you can apply a blur larger than an image dimension ?

The main question here is what would be a right way to do gaussian blur in those cases. What do you think ?
In general case I'd assume that padding the input image with reflect mode to compensate boundary effects is acceptable for gaussian blur. In case of kernel size larger than one input image dimension, I think it is fair to let user pad this dimension before gaussian blur:

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(7, sigma=(0.1, 4.0))
])(im)

@themurtazanazir
Copy link
Author

So, in OCR task you can apply a blur larger than an image dimension ?

Yes it can be. Imagine a synthetic image generation pipeline using a random font and a random font size (from a range). It is possible an image height can be smaller than kernel size. Although not so common, but defintely likely.

The main question here is what would be a right way to do gaussian blur in those cases. What do you think ?

To answer this, I'm going to go on a bit of an over-explanation, so bear with me.

As far as I understand, the GaussianBlur is a wrapper of funtional_tensor.gaussian_blur (hereafter just refernced as gaussian_blur ) with a random sigma chosen. The gaussian_blur accepts multiple padding types because it pads the image to keep the image size same.

Now if we go on to have user pad the image first using Pad and use GaussianBlur on top, in effect it will change the image size.

e.g, in the code:

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(7, sigma=(0.1, 4.0))
])(im)

print(im.size, out.size)

The output is (78, 3) (78, 7) and keeping the size same might be important for usecases, (which is why we must have added padding in the gaussuan_blur itself).

Secondly in this solution, the user again has to dynamically choose the padding given the difference in image size and kernel size.

e.g:

from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(21, sigma=(0.1, 4.0))
])(im)

will again throw the error. Here the user needs to dynamically calculate the padding.

Third is, why are we restricting the function paramters of gaussian_blur in GaussianBlur? I mean "reflect" padding is considered "better" but that should be kept as the default parameter for GaussianBlur but not the only option, IMO.

For the first two problems, we can either let the user decide by adding the parameters pad_mode and pad_value (as is done in this PR) or the way you suggested to dynamically check if the any of the image dimension is smaller, then switch to any other mode than "reflect".

If we consider all three issues, then giving user the choice through parameters seems the only way at the moment.

Sorry for the long note 😇 .

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 21, 2022

@themurtazanazir thanks for exposing your thoughts, I'm still not convinced about new args.

I checked opencv docs and they expose borderType arg for an equivalent purpose: https://docs.opencv.org/4.6.0/d4/d86/group__imgproc__filter.html#gaabe8c836e97159a9193fb0b11ac52cf1
However, their replicate mode seems like to use max possible kernel size instead of provided by user:

import numpy as np
import cv2
i = np.arange(4 * 4, dtype="uint8").reshape(4, 4)

cv2.GaussianBlur(i, ksize=(7, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)

# array([[ 5,  6,  6,  6],
       [ 6,  7,  7,  7],
       [ 8,  8,  8,  9],
       [ 9,  9,  9, 10]], dtype=uint8)

cv2.GaussianBlur(i, ksize=(9, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5,  6,  6,  6],
       [ 6,  7,  7,  7],
       [ 8,  8,  8,  9],
       [ 9,  9,  9, 10]], dtype=uint8)

cv2.GaussianBlur(i, ksize=(11, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5,  6,  6,  6],
       [ 6,  7,  7,  7],
       [ 8,  8,  8,  9],
       [ 9,  9,  9, 10]], dtype=uint8)

Maybe, it worth to check their code to see what they exactly doing...

On the other hand, PIL GaussianBlur does not have such arg: https://pillow.readthedocs.io/en/stable/reference/ImageFilter.html#PIL.ImageFilter.GaussianBlur

So, we have the several options without introducing new args:

  1. do similar thing as opencv (maybe clip kernel size to max possible value)
  2. raise an error if kernel size > image size and suggest to pad and crop
from PIL import Image
import numpy as np
from torchvision import transforms 

im  = Image.new("L", (78, 3), 255)

out = transforms.Compose([
    transforms.Pad([0, 2], fill=127),
    transforms.GaussianBlur(7, sigma=(0.1, 4.0)),
    transforms.Pad([0, -2], fill=127),
])(im)

print(im.size, out.size)
  1. replace padding mode with constant if kernel size > image size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants