-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pad_mode and pad_value for GaussianBlur #6829
base: main
Are you sure you want to change the base?
Conversation
Hi @themurtazanazir! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
@themurtazanazir Thanks for the contribution! We are currently revamping the Transforms API and try to reduce the number of new features/params introduced. Let me get back to you concerning whether we can merge right now or wait for after the API of V2 is finalized. |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
@datumbox Sure. Is there any tracking on this revamping, the timeline or the changes? I'd like to be updated. Thanks. |
@themurtazanazir We track the work on the project on this dashboard. Moreover we are about to publish a blogpost about the new API with all the details and examples. Some high-level examples can be seen on #6753. |
@datumbox Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@themurtazanazir thanks for the updates.
This leaves the V2 of the API disconnected from the implementation on V1. We would need to add the same parameters on the V2 (code pointers here and here).
@vfdev-5 Any concerns/feedback about the nature of this extension?
@themurtazanazir thanks for working on this PR. I was trying to understand better the context of the linked issue and I'm not sure that we need to update the API for cases when input image is smaller than gaussian blur kernel size. Can we just fix gaussian blur implementation for such cases by explicitly using constant mode if it makes sense ? |
@vfdev-5 yes, just to fix this issue, we can dynamically choose what pad_mode to use. The pro is that there is no change in API parameters. However, this new approach adds the flexibility of using even other types of pad_modes, i.e "replicate", "circular" and this is backward compatible (throw errors where it was throwing errors earlier). Furthermore, it would be complicated to explain the dynamic padding modes in in docs. |
@themurtazanazir I think such flexibility can be achived in a composed way instead of adding new args to existing transformation. We can explicitly pad the image with
In this case we can just raise an error with more comprehensible message and suggest how to overcome it. |
I don't think it would be exactly the same.
I cannot speak for that. But for the context, this error will occur even when just one dimension of the image is smaller than the kernel. e.g: from PIL import Image
import numpy as np
from torchvision import transforms
im = Image.new("L", (78, 3), 255)
transforms.GaussianBlur(7, sigma=(0.1, 4.0))(im) This will also throw an error. A common usecase would be generating synthetic images for OCR related tasks.
Here the suggestion would be to use |
So, in OCR task you can apply a blur larger than an image dimension ? The main question here is what would be a right way to do gaussian blur in those cases. What do you think ? from PIL import Image
import numpy as np
from torchvision import transforms
im = Image.new("L", (78, 3), 255)
out = transforms.Compose([
transforms.Pad([0, 2], fill=127),
transforms.GaussianBlur(7, sigma=(0.1, 4.0))
])(im) |
Yes it can be. Imagine a synthetic image generation pipeline using a random font and a random font size (from a range). It is possible an image height can be smaller than kernel size. Although not so common, but defintely likely.
To answer this, I'm going to go on a bit of an over-explanation, so bear with me. As far as I understand, the Now if we go on to have user pad the image first using e.g, in the code: from PIL import Image
import numpy as np
from torchvision import transforms
im = Image.new("L", (78, 3), 255)
out = transforms.Compose([
transforms.Pad([0, 2], fill=127),
transforms.GaussianBlur(7, sigma=(0.1, 4.0))
])(im)
print(im.size, out.size) The output is Secondly in this solution, the user again has to dynamically choose the padding given the difference in image size and kernel size. e.g: from PIL import Image
import numpy as np
from torchvision import transforms
im = Image.new("L", (78, 3), 255)
out = transforms.Compose([
transforms.Pad([0, 2], fill=127),
transforms.GaussianBlur(21, sigma=(0.1, 4.0))
])(im) will again throw the error. Here the user needs to dynamically calculate the padding. Third is, why are we restricting the function paramters of For the first two problems, we can either let the user decide by adding the parameters If we consider all three issues, then giving user the choice through parameters seems the only way at the moment. Sorry for the long note 😇 . |
@themurtazanazir thanks for exposing your thoughts, I'm still not convinced about new args. I checked opencv docs and they expose import numpy as np
import cv2
i = np.arange(4 * 4, dtype="uint8").reshape(4, 4)
cv2.GaussianBlur(i, ksize=(7, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5, 6, 6, 6],
[ 6, 7, 7, 7],
[ 8, 8, 8, 9],
[ 9, 9, 9, 10]], dtype=uint8)
cv2.GaussianBlur(i, ksize=(9, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5, 6, 6, 6],
[ 6, 7, 7, 7],
[ 8, 8, 8, 9],
[ 9, 9, 9, 10]], dtype=uint8)
cv2.GaussianBlur(i, ksize=(11, 0), sigmaX=2.0, borderType=cv2.BORDER_REFLECT)
# array([[ 5, 6, 6, 6],
[ 6, 7, 7, 7],
[ 8, 8, 8, 9],
[ 9, 9, 9, 10]], dtype=uint8) Maybe, it worth to check their code to see what they exactly doing... On the other hand, PIL GaussianBlur does not have such arg: https://pillow.readthedocs.io/en/stable/reference/ImageFilter.html#PIL.ImageFilter.GaussianBlur So, we have the several options without introducing new args:
|
Fix for #6683.