Learning Face Representation from Scratch
Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li
Arxiv
None
Sometimes data is more important than algorithm.
- Propose a semi-automatical way to collect face images.
- Use a 11-layer CNN to learn discrimative representation.
IMDB is well-structured.
- extract the feature template of each face by a pretrained face recognition engine
- use the “main photo” of each celebrity as its seed.
- use the images contains 1 face to augment each celebrity’s seeding images.
- for the remain images in “photo gallery”, find the correspondence between faces and celebrities constrained by similarity and name tag.
- crop face from images and save into independent folder for each celebrity, manually check the dataset and delete the false grouped face images.
11 layer CNNs gray-scale
CASIA-WebFace has 10575 subjects and 494,414 face images.