This blog introduces an interesting application of conditional generative adversarial network (cGAN) for face aging. That is, you can use this cGAN to synthesize the face images of one person at different ages. For research area, this method can be used to improve the performance of “cross-age facial recognition”. For daily application, except for entertainment, it can also be used for finding missing children.
This blog mainly has two contributions:
- They design Age Conditional Generative Adversarial Network (acGAN) to generate face images within required age categories.
- They propose a latent vector optimization approach allowing acGAN to reconstruct input face image preserving the original person’s identity.
2. Methods and Experiments
How does this acGAN work?
As shown in the following figure, after the acGAN is trained, we first use Identity Preserving Optimization to find an optimal latent vector z_star that allows us to generate a reconstructed face image x_bar as close as possible to the original image x with age label y_0. We then let the acGAN use this latent vector z_star with a target age label y_target to generate the final face image with target age.
2.1 Training acGAN
Similar to traditional cGAN, the training process of this acGAN can be expressed as an optimization of the following function (1):
where theta_G and theta_D are parameters of G and D respectively, y is the additional label for training set x (condition for x). In this project, y is a six-dimensional one-hot vectors for six different age categories.
2.2 Approximative Face Reconstruction
In order to generate face image using a given initial face image with a target age label, a map of an input image x with label y to a latent vector z should be constructed, because cGAN does not have an explicit mechanism for this kind of inverse mapping. Therefore, the authors used synthetic dataset of 100K pairs to train an encoder, which is a neural network to approximate the inverse mapping. This encoder is trained to minimize the Euclidean distances between estimated latent vector z_0 and the ground truth latent vector.
But the authors found that although the approximation z_0 result in visually good face reconstructions, the identity of the original image is lost in about 50% of cases. Thus, they proposed a novel “Identity-Preseving” approach to improve this z_0.
The key trick is using a given face recognition neural network FR, to embed the input face image x as FR(x) and to embed the reconstructed one x_bar as FR(x_bar). Minimizing the Euclidean distance between these embeddings rather than that between x and x_bar (also: pixel-wise optimization) make it possible to maintain the identities of original face images.
Concretely, the following figure illustrates the difference between “Pixel-Wise” method and “Identity Preseving (IP)” method. Part © shows that the facial expression or hair-style can be better maintained using IP method. Part (d) shows the generated images using IP latent vector with age label as input of this acGAN.
The following figure shows the performance of this acGAN using a random latent vector with age labels:
In oder to quantify the difference between the IP-method and Pixelwise-method, the authors used “OpenFace” to recognize the generated face images as the metric. From this table, it is obvious that the IP-method can maintain more face-features (face identities) through this generation process, because the FR score is much higher than that of Pixel-Wise or initial ones.
The main advantage of this acGAN is that they use “Identity-Preserving” latent vector optimization approach to maintain the original person’s identity in reconstruction.
This method can also be used for synthetic augmentation of face datasets and for improving the robustness of face recognition solutions in cross-age scenarios.