The Jupyter Notebook autoencoder_mpl.ipynb
contains the implementation of an Autoencoder Neural Network for the creation of latent spaces representing two types of human body class: class α shapes that possess the most common appearance for the human body and class β shapes, bodies with a limb amputation (e.g. leg or arm).
The described network is part of my CS Master's thesis at Sapienza University titled Fairness in Geometry Processing.
The notebook needs to load the partial_shapes_matrix.npy
and total_shapes_matrix.npy
files included in dataset
directory.
For a faster training of the network run the notebook in Colab and activate GPU hardware acceleration.
- PyTorch
- Numpy
The thesis context is Fair Machine Learning, the study of correcting bias respect to sensitive variables in automated decision processes based on ML models. Generally current human body model generation methods create human bodies compliant with the standard person capabilities and we have very little material on bodies considered a deviation from the norm. The objective is to work on geometric methods that favor a representation of all human bodies in their diversity.
In particular we focused on the body modeling aspect of Virtual Humans and its creation given by statistical body models. Statistical body models are geometric models that describe human pose and body shape in a unified framework by leveraging an encoding for mesh surfaces; this technique fastly produces a human 3D model with a quite satisfactory level of detail. We chose SMPL as our statistical model.
We generated a 3D dataset with two types of classes: α shapes have the most common appearance for a human body; β shapes reproduce the appearance of a person that no longer possesses a limb. It was important for having a realistic appearance that we did not have a clean cut near the point of amputation, but instead a "smooth" deformation. Specifically we applied a Conformalized Mean Curvature Flow and we took mkazhdan code as a reference, so convergence problems like extreme expansion of the shape were avoided. For further details on this part go here.
Next we wanted to train an Autoencoder (AE) Neural Network on the created dataset in order to obtain a latent space containing the representation for the classes. Firstly we performed experiments on a classic AE which had the purpose to create a single latent space for both type of shapes but this approach presented problems in the shape reconstruction, as highlighted above.
Our model, respect to a classic AE, reconstructs shapes with accuracy and realism: the decoded shape is very much similar to the original input; moreover the model gives coeherent results for the mapping of shapes, since the reconstructed shapes even after this operation are pretty similar in appearance to the correspondent input shape. We arrive to this solution after two separate training on a dataset containing 1000 instances for both classes.
The model architecture comprehends two main AEs mirroring one another and two Multi-Layer Perceptrons (MLP) connecting the latent spaces of each AE. AEα has an Encoder Eα with two linear layers followed by tanh activation and a Decoder Dα with one linear layer followed also by tanh activation. Dimensions are n × 3 → 512 → 256 → 256 → n × 3. We have the same structure for AEβ, with an Encoder Eβ and a Decoder Dβ. The MLP Nβα is a network mapping latent space vβ from AEβ to latent space vα in AEα. It has five linear layers followed by SELU activation and a batch normalization layer. Dimensions are f → 64 → 128 → 256 → 128 → 64 → f. MLP Nαβ similarly has the same structure.
The training loss is the following:
A single AE trained on both datasets has a worse behavior than our model. Our proposal works pretty fine for reconstructing the shapes and also for mapping. This suggests that it exists a good connection between the latent spaces and there is a good knowledge about the similarity between shapes belonging to different spaces.
We can notice that the reconstruction of the proposed model is nearly identical to the input shape, proving that our method is working properly and it learned in an effective way. The classic AE, instead, does not provide a mesh with the same identity template of the input and it does not present a likely appearance in the limb reconstruction. We can also notice that our method is giving visually good results even for the mapping between latent spaces: the latent representation of the class α is mapped to the opposite latent space and its decoding shows a body shape very similar to the input shape β, and viceversa for the other class.
Our model represents a better working solution respect to the classic AE, because producing body shapes with not likely results does not contribute at all to a fair appearance of body models. Our results in decoding and mapping let instead protect the uniqueness in the shapes since the identity of a body is preserved and it is guaranteed a closer representation of the body to reality.
We performed shape interpolation between models from a test dataset and we obtained their mapping to the opposite latent space. An interpolation is given by the following expression:
where
We have observed from experiments that a classic AE does not permit to move properly from one shape to another in a space with large variations: the interpolated results are unrealistic and uninteresting. In our model space it is possible instead to move in an Euclidean way: we have an expressive latent space that may find application in a generative model. From a fairness point of view, the expressiveness would facilitate the creation of new types of data and thus it would favor a more inclusive representation within the human body generation methods.
We used Principal Component Analysis (PCA) on latent spaces in order to enhance a visual representation of them, plotting the obtained components on a graph. Latent variables are obtained from a given dataset of 400 test shapes (200 from class α, 200 from class β), not seen at training time.
Here: the comparison visualization of latent space vα and visualization of the mapping of latent space vβ; the comparison of visualization of vβ and visualization of the mapping of vα.
We can see that, in the space visualizations of our model, the distribution is respected quite well, even if there is not a perfect correspondence between points. In particular, we notice that the difficulty in the matching is mainly present in the contour areas but this behavior does not happen in the center. There are empty spaces both for the latent space generated by the encoder and for that generated by the mapping.
Here the plot of latent space for the classic AE, where we are comparing the display of latent space for α shapes and that for β shapes.
At the right-most part of the plot, almost half the data from class β are far away from class α data: the network makes a more marked division between the two classes of shapes, shapes of different classes are more different between each other.
Our model has created instead a latent space where there is no evident separation between classes. This aspect testify that our proposal is indeed fair, since shapes from different classes, when living in the same space, are more similar between each other.