Gordon Lecture: Deep Generative Models for Image Translation

Dr. Harry Yang is currently a Research Scientist at Facebook AI. He received his PhD in computer science at University of Southern California. His research interests are deep generative models and their computer vision applications, such as image inpainting, image translation and human retargeting. 
Below is a summary of his presentation

In this talk, Dr. Yang addressed the problem of translating faces and bodies between different identities without paired training data: they cannot directly train a translation module using supervised signals in this case. Instead, they propose to train a conditional variational auto-encoder (CVAE) to disentangle different latent factors such as identity and expressions. In order to achieve effective disentanglement, they further use multi-view information such as keypoints and facial landmarks to train multiple CVAEs. By relying on these simplified representations of the data they are using a more easily disentangled representation to guide the disentanglement of image itself. Experiments demonstrate the effectiveness of their method in multiple face and body datasets. They also showed that their model is a more robust image classifier and adversarial example detector comparing with traditional multi-class neural networks.

To address the issue of scaling to new identities and also generate better-quality results, they further propose an alternative approach that uses self-supervised learning based on StyleGAN to factorize out different attributes of face images, such as hair color, facial expressions, skin color, and others. Using pre-trained StyleGAN combined with iterative style inference they can easily manipulate the facial expressions or combine the facial expressions of any two people, without the need of training a specific new model for each of the identity involved. This is one of the first scalable and high-quality approach for generating DeepFake data, which serves as a critical first step to learn a more robust and general classifier against adversarial examples.