Skip to content

Lightweight adaptation and re-implementation of PRNet to perform real-time 3D face reconstruction on CPU/mobile device.

Notifications You must be signed in to change notification settings

rendchevi/L-PRNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L-PRNet

Project Description

L-PRNet (Lightweight Position Regression Network) is a lightweight adaptation and re-implementation of PRNet, a deep learning model designed to perform 3D face reconstruction given only a single RGB image of the face.

You can check the original PRNet author's repo here and their very well written paper here.

I was so impressed by the author's works while back when it first came out, their solution for 3D face reconstruction is so straight-forward and elegant by creating a novel 2D represention of the 3D face structure called UV position map of which one can directly regress 2D facial image into it's 3D structure with low computational cost.

This project concerned on making 3D dense face reconstruction lightly run on CPU device, specifically on the web and mobile devices. It'd probably a bit of a ramble to specify what I actually want to achieve with L-PRNet but in short, I intend to make 3D face reconstruction accessible on mobile devices for AR/VR/MR and creative purposes.

To do that, I think improving PRNet efficiency by reducing complexities of the input-output data and the network architecture might be the answer.

In terms of speed, the original PRNet is already awesome capable to achieve real-time inference on GPU device (100 fps with GTX 1080).
But, PRNet inference on CPU device isn't fast enough nor achieve real-time which would be problematic if we want to run it on the web and mobile devices.

In this repo, I provide my early iteration of the L-PRNet (pre-trained mode and some testing codes).
Feel free to experiment it with yourself or reach me if you've interest to improve (or currently on) the project.

How to use the model

Below are the following dependencies needed:

tensorflow=2.2.0 # (tensorflow-cpu is also fine)
cv2
dlib
imutils
open3d # optional: to visualize 3D pointcloud
moviepy # optional: to test on video

Python file inference.py contain LPRNet class which you can utilize for the following:

from inference import LPRNet
from skimage.io import imread

# Initialize the model
model = LPRnet()

# Detect face in the image
image = imread('.../example.jpg')
cropped_face = model.detect_face(image)

# Predict UV Map
uv_map = model.predict_uv(cropped_face)

# Visualize 3D pointcloud
frame_contain_pcl, pcl = model.visualize_pcl(uv_map)

I provided a testing notebook to utilize the model to predict 3D face in a video (you need moviepy module for this).

Project Details

Difference between L-PRNet and PRNet architecture

  • L-PRNet's input and output dimension is 128x128x3
  • L-PRNet's encoder uses a MobileNet-like architecture, utilizing a separable convolution to make the network more efficient.
  • L-PRNet's decoder uses a simpler (shallower) stack of transposed convolutions.

Dataset Preparation and Preprocessing

Data preparation and preprocessing process are similar with the original PRNet. Dataset used is 300WLP dataset, specifically I used the whole 300WLP subset which are HELEN, IBUG, AFW, and LFWP dataset. The dataset contains around 100k face images in various poses along with their 3DMM (3D Morphable Model) parameters which would be used to reconstruct the ground truth 3D structure/model of the face.

For the network's input, 2D face images from the dataset are directly used (images are normalized [0, 1]). The network's output is a 2D representation of 3D structure of the face, called the UV position map. Basically, UV position map is a 2D projection of the 3D face structure onto the UV space. To generate the UV position map, I used the author's python modules to process 3DMM into UV position map.

UV position map is a 128x128x3 array where each channel represents 3D (x,y,z) coordinates, we can simply reconstruct the 3D structure from the UV map (in the form of point cloud) via reshaping it to 128*128x3 array.

Below is a sample of the input-output pair.
IO-Image

In the original PRNet paper, the author perform data augmentation process (color scaling, translation, and rotation) which I haven't done it yet. I plan to re-train the model soon with data augmentation and update the pre-trained model in this repo.

Network Architecture and Training

The architecture of L-PRNet is similar to PRNet, a CNN autoencoder.
The encoder part is composed of a stack of 2D depthwise separable convolutions, similar to MobileNet architecture.

Conv2D-32 => DepthwiseSep2D-64 => DepthwiseSep2D-128 => DepthwiseSep2D-256 => DepthwiseSep2D-512

The decoder part is composed of a stack of 2D transposed convolutions, which I made shallower compared to the original PRNet to reduce computational cost.

Conv2DTranspose-512 => Conv2DTranspose-256 => Conv2DTranspose-128 => Conv2DTranspose-64 => Conv2DTranspose-32 => Conv2DTranspose-3

Adam is used as the optimizer, the learning rate starts at 0.0001 and decay half after 5 epochs as done in the original PRNet, batch size used is 16. The pre-trained model in this repo have been trained for 15 epochs. Loss function used is the same as the original PRNet, a weighted mean squared error.

Future Improvements

  • Re-train the model with augmented data suggested in the original paper for handling more difficult situation
  • Evaluate the model with NME as done in the original PRNet paper
  • Create applet for an interactive 3D reconstruction
  • (This one's a bit of a longshot) Study on applying similar 2D UV representation of other human body parts like full head, body, or hands.

About

Lightweight adaptation and re-implementation of PRNet to perform real-time 3D face reconstruction on CPU/mobile device.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published