VGG-PyTorch

PyTorch implementation of selected VGG models. Useful for feature extraction from face images and speech waveforms.

Model	Modality	Checkpoint	Checkpoint notes	Other comments
VGG-M [1]	Face	Download	Face recognition model trained on VGG-Face [1].	The model's code is adapted from the official implementation. I have added some functions to facilitate pre-processing and feature extraction.
VGG-M [2]	Voice	Download	Speaker recognition model trained on VoxCeleb1 [2]	The original code is in Matlab and I re-implemented it here in PyTorch.
VGGVoxResNet [3]	Voice	Download	Speaker verification model trained on VoxCeleb2 [3]	The original code is in Matlab and I re-implemented it here in PyTorch.
SVHF [4]	Face + Voice	Download	Static binary voice-to-face matching model trained on VoxCeleb2 [3] [4]	The original code, for static binary voice-to-face matching, is in Matlab. I re-implemented it here in PyTorch and added functionality for face-to-voice matching, as well as dynamic images.

Installation

Prerequisites:

NumPy https://numpy.org/
PyTorch https://pytorch.org/
OpenCV https://pypi.org/project/opencv-python/
SciPy https://scipy.org/
Librosa https://librosa.org/doc/latest/index.html

To install from source, run the following:

pip install -e .

Example usage

Face feature extraction:

import torch

from vgg_pytorch.preprocessing import preprocess_image
from vgg_pytorch.models import VGG_M_face_bn_dag

device = "cuda:0"
path_to_checkpoint = "<insert path to pre-trained model checkpoint>"
path_to_image = "<insert path to image>"

# Load model
model = VGG_M_face_bn_dag()
model.to(device)
model.eval()
model.load_state_dict(torch.load(path_to_checkpoint, map_location = device))

# Pre-process input image
x = preprocess_image(path_to_image)

# Extract features
with torch.no_grad():
    z = model.extract_features(x.unsqueeze(0).to(device))

Voice feature extraction:

import torch

from vgg_pytorch.preprocessing import preprocess_audio
from vgg_pytorch.models import VGGMVox, VGGVoxResNet

device = "cuda:0"
path_to_checkpoint = "<insert path to pre-trained model checkpoint>"
path_to_audio = "<insert path to an audio file>"

# Load model
model = VGGVoxResNet()
model.to(device)
model.eval()
model.load_state_dict(torch.load(path_to_checkpoint, map_location = device))

# Pre-process input audio
x = preprocess_audio(path_to_audio)

# Extract features
with torch.no_grad():
    z = model.extract_features(x.unsqueeze(0).to(device))

Cross-modal face and voice feature extraction:

import torch

from vgg_pytorch.preprocessing import preprocess_image, preprocess_audio
from vgg_pytorch.models import SVHF

device = "cuda:0"
path_to_checkpoint = "<insert path to pre-trained model checkpoint>"
path_to_face = "<insert path to image>"
path_to_audio = "<insert path to an audio file>"

# Load model
model = SVHF()
model.eval()
model.to(device)
model.load_state_dict(torch.load(path_to_ckpt, map_location = device))

# Pre-process inputs
x_f = preprocess_image(path_to_face)
x_a = preprocess_audio(path_to_audio)

# Extract features
with torch.no_grad():
    z_f = model.face_net(x_f.unsqueeze(0).to(device))
    z_a = model.voice_net(x_a.unsqueeze(0).to(device))

Citing

If you use this code, please cite the original publications of the authors:

@InProceedings{Parkhi15,
author       = "Omkar M. Parkhi and Andrea Vedaldi and Andrew Zisserman",
title        = "Deep Face Recognition",
booktitle    = "British Machine Vision Conference",
year         = "2015",
}

@InProceedings{Nagrani17,
author       = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title        = "VoxCeleb: a large-scale speaker identification dataset",
booktitle    = "INTERSPEECH",
year         = "2017",
}

@InProceedings{Nagrani17,
author       = "Chung, J.~S. and Nagrani, A. and Zisserman, A.",
title        = "VoxCeleb2: Deep Speaker Recognition",
booktitle    = "INTERSPEECH",
year         = "2018",
}

@InProceedings{Nagrani18a,
author       = "Nagrani, A. and Albanie, S. and Zisserman, A.",
title        = "Seeing Voices and Hearing Faces: Cross-modal biometric matching",
booktitle    = "IEEE Conference on Computer Vision and Pattern Recognition",
year         = "2018",
}

Please also refer to the following official implementations:

VGG-M face model: http://www.robots.ox.ac.uk/~albanie/models/pytorch-mcn/vgg_m_face_bn_dag.py
VGGVox models in Matlab: https://github.com/a-nagrani/VGGVox/tree/master
SVHF model in Matlab: https://github.com/a-nagrani/SVHF-Net
Converting VGG Matlab models to PyTorch: https://github.com/albanie/pytorch-mcn

References

[1]	(1, 2) Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, "Deep face recognition", British Machine Vision Conference, 2015

[2]	(1, 2) Arsha Nagrani, Joon S. Chung, Andrew Zisserman, "VoxCeleb: a large-scale speaker identification dataset", Interspeech, 2017

[3]	(1, 2, 3) Joon S. Chung, Arsha Nagrani, Andrew Zisserman, "VoxCeleb2: Deep Speaker Recognition", Interspeech, 2018

[4]	(1, 2) Arsha Nagrani, Samuel Albanie, Andrew Zisserman, "Seeing Voices and Hearing Faces: Cross-modal biometric matching", IEEE CVPR, 2018

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vgg_pytorch		vgg_pytorch
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGG-PyTorch

Installation

Example usage

Citing

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AkisKefalas/VGG-PyTorch

Folders and files

Latest commit

History

Repository files navigation

VGG-PyTorch

Installation

Example usage

Citing

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages