Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BC-breaking] Aligning fill = None with fill = 0 on Transforms #6623

Open
datumbox opened this issue Sep 21, 2022 · 7 comments
Open

[BC-breaking] Aligning fill = None with fill = 0 on Transforms #6623

datumbox opened this issue Sep 21, 2022 · 7 comments

Comments

@datumbox
Copy link
Contributor

datumbox commented Sep 21, 2022

🚀 The feature

As discussed at #6517, passing None on fill for some transforms isn't the same as passing 0. The difference applies on some cases but not others, it was potentially done to avoid JIT issues (or is it a bug?) and has unforeseen effects on the behaviour of the Transforms.

Currently on the prototype area:

  • All functionals have fill=None as default value.
  • All Transforms have fill=0 as default.
  • All AutoAugment Transforms have fill=None.

The above inconsistent behaviour maintains BC with stable but leads to a weird API.

Motivation, pitch

Break BC on the functional level and make fill=None behave similar to fill=0.

This unfortunately will have accuracy effects on models trained with the Tensor API. Worth noting that such a change would align closer with PIL which already sets fill=None to 0:

if fill is None:
fill = 0

This proposal could be adopted if we decide to break BC and align the Tensor and PIL backends on other aspects (such as antialias etc).

Alternatives

There are 2 other potential alternatives:

  • Keep the Transforms and functionals as-is and make no breaking changes.
  • Make a breaking change on the Transforms level (AutoAugment) but keep the functionals as-is. This will minimize the effect of the change on model inference for pre-trained models but maintain the discrepancy between the 2 backends.

Additional context

No response

cc @vfdev-5

@NicolasHug
Copy link
Member

Trying to understand this more, is the problem that:

  • fill has different defaults across APIs?
  • fill=0 is different from fill=None?
  • both?

If we were to have fill with the same default across APIs, would we still have a problem?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 21, 2022

Trying to understand this more, is the problem that:

-> answer is both

3rd alternative is: to try to put fill=0 in functional ops and make fill=0 behave as fill=None.

Checking the codebase history, I have an impression is that default fill 0 or None has its root in PIL API:

Fill argument exists for pad, affine, rotate and others affine-like functional ops.

  1. For PIL input pad invokes ImageOps.expand with fill which can't have None value -> thus fill is 0 by default in torchvision for pad

  2. affine invokes PIL Image.transform with fill which can take None or 0 and produces same result with 0 pixels. PR with affine in torchvision had fill=None by default

  3. rotate invokes PIL Image.rotate with fill which can take None or 0 and produces same result with 0 pixels. rotate was with fill=0 for some time and then was fixed to None (Fix fill in rotate #1760)

So, making output result for all the ops the same if fill=None and fill=0, would be a move forward.
Then we can pick None or 0 as default value and make API uniform on that...

@datumbox
Copy link
Contributor Author

@vfdev-5 Thanks for the comprehensive reply. Do you think I should add the alternative as a 3rd option? Or perhaps I could just edit the "proposal" to suggest that we should be using 0 as defaults and just handle None similarly? My understanding is that the 2 proposals are similar and it's a matter of which default value we choose to expose (I agree that 0 is probably better).

@NicolasHug
Copy link
Member

NicolasHug commented Sep 22, 2022

Thanks for the details @vfdev-5 .

passing None on fill for some transforms isn't the same as passing 0

I'm looking at #6517 but I can't pin-point exactly what the difference is. Would you have more details? Is it that the "filling" simply doesn't happen when fill=None, or is it something more subtle than that?

Also instead of a BC breaking change, is deprecation an option here?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 22, 2022

I'm looking at #6517 but I can't pin-point exactly what the difference is. Would you have more details? Is it that the "filling" simply doesn't happen when fill=None, or is it something more subtle than that?

Yeah, this issue description may be unclear. The problem arises when we call _apply_grid_transform with mode="bilinear" + 1) fill=None and 2) fill=0. Two outputs wont match. I assume that outputs should match because for me fill=None and fill=0 is the same configuration. Also, by the way outputs are matching if mode="nearest" + similar configs for fill.

Why results are not matching for bilinear ?

Because when fill is None we do only grid sampling (which does padding_mode="zeros"). If fill=0 we create additional mask channel, concat it to the input image, then grid sample, extract transformed mask and apply it with fill values (which are 0 again) to the output image. The way how we blend mask with fill value and the image is probably incorrect for bilinear mode (see images in the issue, there is a black shadow around rotated image for current torchvision. This shadow is absent for PIL). So, the problem comes from blending zero fill values to already transformed image with zero fill.

Before fixing blending, we may ask why we bother doing all this mask fancy stuff for fill=0 if grid sample already does that for us. So, I would expect that fill=0 should be a no-op like fill=None in terms of computing mask and manually applying fill value.

Thus, point 1 in the issue is about making fill=0 option as like fill=None without computing any fill mask.
Point 2 is about doing something with fill value mask blending for mode=bilinear.

@NicolasHug
Copy link
Member

NicolasHug commented Feb 17, 2023

Just had a chat with @vfdev-5 and @pmeier , I'll try to briefly summarize the situation on V1 transforms for my own understanding (possibly re-hashing what has already been said above). These are general statements that do not necessarily apply to all transforms in all cases (sometimes, only a subset of them):

  • fill has inconsistent default values across transforms (None for AA, 0 for the rest). Functionals mostly have None as default except for Pad (0).
  • fill may have a different default value between the transforms and their corresponding functionals. E.g. Perspective (0 in transforms, None in functional).
  • passing the same fill value to a transform and to a functional may lead to different results (because some transforms convert None to 0, which is an undocumented behaviour)
  • on the functionals, passing fill=0 or fill=any_color may result in different results between PIL and Tensors (passing fill=None should result in consistent results). Whether those differences are within the range of acceptable differences or not is unclear (to me).

All of these are issues we'll need to fix eventually. We don't know how yet.

As discussed in #7258, the V2 transform introduce a minor BC-breaking change by not doing this anymore:

some transforms convert None to 0, which is an undocumented behaviour

I.e. if users pass None to a transform, the functional will now get None as well in V2. This allows users who explicitly pass None to get consistent results between PIL and Tensor backends. It's slightly BC because this may lead to different results (compared to V1) for users who a) were passing None explicitly which was never documented and b) were using the Tensor backend. The PIL backend is unchanged because the pil functionals always do the None->0 conversion anyway.

@pmeier
Copy link
Collaborator

pmeier commented Feb 20, 2023

  • on the functionals, passing fill=0 or fill=any_color may result in different results between PIL and Tensors (passing fill=None should result in consistent results). Whether those differences are within the range of acceptable differences or not is unclear (to me).

To narrow it down a little, this only happens for InterpolationMode.BILINEAR. In general it applies to all interpolation modes that can create intermediate values, e.g. bicubic, but the affine functions are limited to nearest and bilinear interpolation:

_assert_grid_transform_inputs(image, matrix, interpolation.value, fill, ["nearest", "bilinear"])

#6517 goes into details, but here is the TL;DR: PyTorch's grid_sample does not support filling by any color and is equivalent to fill=0. If the user requests any fill color, we apply the same affine transformation to a boolean mask and use that to apply the filling later. However for bilinear interpolation this creates values besides 0 and 1 and so we have to decide how we want to deal with this. Current implementation accepts these extra values and uses them to blend between the fill color and the original image. However this creates shadow artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants