Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RandomResizedCrop API Change #676

Closed
LukeWood opened this issue Aug 8, 2022 · 16 comments · Fixed by #738
Closed

RandomResizedCrop API Change #676

LukeWood opened this issue Aug 8, 2022 · 16 comments · Fixed by #738

Comments

@LukeWood
Copy link
Contributor

LukeWood commented Aug 8, 2022

Currently, RandomResizedCrop is tuned by taking a crop_area_factor and aspect_ratio_factor.

The API should be updated to take a target_size, zoom_factor, and an aspect ratio factor. At augmentation time, a value is drawn randomly from the Zoom factor and aspect ratio factor distributions. a crop size is computed by multiplying each dimension of target size with the value drawn from zoom_factor. Next, the width and height of the crop size are distorted accordingly by the value drawn from the aspect ratio factor. A crop of this size is taken from the image, and finally is resized to the target size.

Some edge cases:

  • when the crop size is larger than the image, we want to still respect aspect ratio (random crop uses smart resize)
@AdityaKane2001
Copy link
Contributor

AdityaKane2001 commented Aug 10, 2022

@LukeWood

My understanding is as follows:

  • Assume an input image of 300x400. (height x width)
  • First we sample a zoom factor (would be <1 in a normal case?) and take a crop of that size. Assuming zoom factor is 0.5, we will take a crop of 150x200.
  • Then a width and height are calculated and the image is resized to this new distorted size. Assuming the aspect ratio to be 0.5 (height : width), the new image will be 150x300 (or 100x200?).
  • Finally resize to a target size, say 224x224.

Is this procedure correct? If this is incorrect, could you please jot down the procedure in the same way as above to avoid any confusion?

/cc @sayakpaul

@LukeWood
Copy link
Contributor Author

First we sample a zoom factor (would be <1 in a normal case?) and take a crop of that size. Assuming zoom factor is 0.5, we will take a crop of 150x200.

either less or greater than one. 150x200 zoomed by 2 should crop a 75x100 region, so it is zoomed to double the size

Then a width and height are calculated and the image is resized to this new distorted size. Assuming the aspect ratio to be 0.5 (height : width), the new image will be 150x300 (or 100x200?).

I guess to get the desired effect, zoom_factor => should be inversed before multiplying. so a zoom factor of 2 => 0.5 and vice versa. This means a zoom factor of 0.5 zooms out, 2.0 zooms in.

Finally resize to a target size, say 224x224.

yep! zoom_Factor 1.0 and aspect ratio 1.0 should just be the same as random crop basically.

Sound good?

@AdityaKane2001
Copy link
Contributor

AdityaKane2001 commented Aug 11, 2022

Okay. Zoom factor is almost clear to me now.

Assuming we use a zoom factor of 0.5 (which will zoom out the image) we get a 600x800 image, right? How would the image look like? Will the image repeat itself or is it just a normal resize?

@LukeWood
Copy link
Contributor Author

you will get a 600x800 crop, but then it will be resized BACK to 224x244. So zoom is computed based on crop size.

Imagine a 1000x1000 image with a crop size of 200x200. If you sample zoom factor 0.5, your crop size will be 400x400, then you will be resized back to 200x200.

@LukeWood
Copy link
Contributor Author

LukeWood commented Aug 12, 2022

An exact process:

target_size, zoom_factor, aspect_ratio_factor.

the process starts and a value is sampled from zoom_factor:

target_size = (50, 50)
zoom = zoom_factor() # lets use 0.5
aspect_ratio = aspect_ratio_factor() # lets use 9/10

next, crop size is calculated using target size and zoom:

crop_size = target_size / zoom # (100, 100)

next, aspect_ratio is applied:

crop_size = distort_aspect_ratio(crop_size, aspect_ratio)
#  (100/sqrt(9/10), 100*sqrt(9/10))
#  (94, 105) # I rounded these

take a crop of crop_size:

# we compute crop locations however needed
crop = tf.image.crop(x, y, crop_size)
result = tf.image.resize(crop, target_size)

@AdityaKane2001
Copy link
Contributor

@LukeWood

Thanks for this, it really helps!

I have two concerns regarding this approach:

  1. I understand that this is quite close to the current implementation, in terms of the end result. The gist is the same - we crop a random area and resize it to the given target_size. Given this, I am not sure why do we need to include target_size in the calculation of crop_size. target_size should ideally have no influence over the crop_size.
  2. The zoom_factor makes it a bit unintuitive IMO. zoom_factor being a positive float does not signify something interpretable. In the current API, crop_area_factor signifies the part of total area that is going to be cropped.

@LukeWood
Copy link
Contributor Author

Thanks for this, it really helps!

I have two concerns regarding this approach:

I understand that this is quite close to the current implementation, in terms of the end result. The gist is the same - we crop a random area and resize it to the given target_size. Given this, I am not sure why do we need to include target_size in the calculation of crop_size. target_size should ideally have no influence over the crop_size.

Any reason why not? If the goal of the layer is to take zooms with a level of distortion, it should be easy to tune the distortions relative to the result.

The zoom_factor makes it a bit unintuitive IMO. zoom_factor being a positive float does not signify something interpretable. In the current API, crop_area_factor signifies the part of total area that is going to be cropped.

The nice thing about zoom_factor is that you can pass something like: 1.0 and reasonable reason that it has NO zoom, and you can tune it up or down in incremental amounts. With area_factor there is NO way to ensure that you will have no zoom, which makes it incredibly hard to reason about the result the layer will have on your preprocessing pipeline

@LukeWood
Copy link
Contributor Author

@martin-gorner has more thoughts on this too

@martin-gorner
Copy link

Toughts:

  • Please make sure the min and max zoom values have the same meaning as in Model Garden's implementation where they are called aug_scale_min and aug_scale_max
  • An edge case not yet considered is what happens when the computed crop_size ends up being larger, in at least one dimension, than the image. For example, if the image is 1024 x 512 pixels and the computed crop_size is 700x700 ? One possibility is to add black borders, another is to use the layers.Resizing(crop_to_aspect_ratio=True) algorithm, i.e. cut the biggest part of the image that has the same aspect ratio as crop_size. This would not quite respect the zoom factor in exchange of not having black bars. (Care must be taken to retain location randomness though: even if there is only one way to squeeze and maximally fit a 700x700 square along the 512px dimension, there are multiple possible locations along the 1024px dimension of the original image!)

I think I prefer the solution without black bars for two reasons:

  • black bars can introduce unwanted training effects: if a dataset has one class represented more often by elongated images, which are more likely to produce black bars when RandomResizeCropped, the model can start detecting black pixels as a proxy for this class.
  • RandomResizeCrop should become RandomCrop with no zooming and no distorsions, and RandomCrop never produces black bars (but please check if RandomCrop does respect location randomness in the edge case ?)

@sayakpaul
Copy link
Contributor

black bars can introduce unwanted training effects: if a dataset has one class represented more often by elongated images, which are more likely to produce black bars when RandomResizeCropped, the model can start detecting black pixels as a proxy for this class.

I like this thought experiment. Tremendous one.

@LukeWood
Copy link
Contributor Author

@AdityaKane2001 is this clear now?

@AdityaKane2001
Copy link
Contributor

@LukeWood Yup, got it. I'll create a PR for this over the weekend.

@AdityaKane2001
Copy link
Contributor

@martin-gorner @LukeWood @sayakpaul

An edge case not yet considered is what happens when the computed crop_size ends up being larger, in at least one dimension, than the image.

To avoid this case, I'll just clip the crop_size values to the image dimensions. It seems to have the same effect as layers.Resizing(crop_to_aspect_ratio=True). Moreover it will intuitively handle the case where both dimensions of crop_size are larger than the image dimensions. WDYT?

@martin-gorner
Copy link

martin-gorner commented Aug 23, 2022 via email

@AdityaKane2001
Copy link
Contributor

@martin-gorner

Understood.

@LukeWood @sayakpaul

I guess I will rewrite the implementation and drop tf.image.crop_and_resize as it does not have this functionality.

@martin-gorner
Copy link

I think tf.image.crop_and_resize should work fine. You just have to pass it the correct crop boxes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants