Skip to content

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (Apr 2018 CVPR) #5

@andrewjong

Description

@andrewjong

0. Article Information and Links

1. What do the authors try to accomplish?

Create a tool that judges image similarity closer to humans than traditional math-based approaches such as SSIM

2. What's great compared to previous research?

perceptual
Screenshot_2020-06-16_19-05-26
The proposed method agrees more often with human perception.

3. Where are the key elements of the technology and method?

Berkeley-Adobe Perceptual Patch Similarity Dataset

  • Dataset of many distortions

    • Traditional augmentations e.g. blurring; and CNN tasks e.g. autoencoder, coloring, denoising, superresolution
      Screenshot_2020-06-16_19-01-53
  • Use 64x64 patches for low-level similarity aspects. 161K patches total from 5K images

  • Collect human judgement from AMT on below tasks

Tasks

  • Two Alternative Forced Choice (2AFC): present two distortions, ask which is more similar to the original? Used for training
  • Just Noticeable Differences: validation metric, ask if 2 images (original and distorted) are the same or different.

NN Similarity Task and Prediction

  • x is source image, x0 is distorted image. Compare two distortions to judge which is more similar to source image.
    Screenshot_2020-06-16_19-13-58

4. How did you verify that it works?

TODO

5. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)

Weaknesses

  • Evaluator can ONLY make binary prediction of whether IMAGE 1 or IMAGE 2 is more similar to a reference image. There is no meaningful quantitative scale. Even though NNs output a float between [0, 1], it's commonly known that this number does not reflect statistical confidence.
  • Training an evaluator Net requires collection of a human-judgement dataset of perceptual differences. Requires finetuning the classification net on the dataset. Could be theoretically done for any evaluation task, but possibly expensive.

Questions

  • Would traditional classification ConvNet evaluators work on 3D invariance? probably not

Potential for future work

  • Ideally we can provide a trained evaluator Net that performs similarly to humans at virtual try-on perception. Could do it for different perception tasks:

    • given two possible try-ons, which is more accurate to the original cloth image?
    • given two possible try-ons, which is more accurate to the original person image?
    • given a boundary between two clothing items, which is more realistic?
  • Still have to figure out how to modify this to be a quantitative continuous metric

6. Are there any papers to read next?

7. References

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions