Blog Logo
·
· · ·

Introduction

The basic adversarial framework of the GAN architecture can be broken down into the following two players:

• A generative model $G$, that tries to capture the latent data distribution.
• A discriminative model $D$, that estimates the probability that a sample came from training data rather than $G$.

The framework is adversarial in the sense that the training procedure for $G$ tries to maximize the probability of $D$ making a mistake. The framework thus corresponds to a minimax two-player game.

• Restricted Boltzmann Machines
• Deep Boltzmann Machines
• Deep Belief Networks
• Denoising Autoencoders
• Contractive Autoencoders
• Generative Stochastic Network

Notations

• Easiest to implement GANs when the models are multilayer perceptrons for both generator and discriminator.
• $p_g$ is the generator’s distribution over data $x$.
• $p_z(z)$ is an input noise function and $G(z; \theta_g)$ is the mapping to data space.
• $G$ is differentiable function represented by a paramter $\theta_g$.
• $D$ is another differentiable function that outputs a scalar.
• $D(x)$ represents the probability of assigning the correct label to both training examples and samples from $G$.
• $G$ is simultaneously trained to minimize $log(1-D(G(z)))$

Optimization Objective

The training framework between $D$ and $G$ can be represented by a two player minimax game in value function $V(G,D)$,

Implementation Details

• $G$ and $D$ are trained iteratively one after the other
• $D$ is not optimized to completion as it would lead to overfitting
• Alternate between $k$ steps of optimizing $D$ and one step of $G$
• Results in $D$ near its optimal, so long as $G$ changes slowly.
• Early in learning when $G$ is poor $D$ can reject samples with high confidence which causes $log(1-D(G(z)))$ to saturate
• Instead of minimizing $log(1-D(G(z)))$, maximize $log(D(G(z)))$ for stronger gradients early in the learning.

Theoretical Results

For a fixed $G$, the optimal discriminator can be found by differentiating the objective function w.r.t. $D(x)$. The objective function is of the form,

Differentiating w.r.t $y$ gives,

Since we are maximising this, the maximum can be found by estimating the point of 0 derivative, i.e,

So the optimal discriminator for a fixed $G$ is given by,

For this maximized $D$, the optimization objective can be rewritten as,

We can show that this expression is minimized for $p_g=p_{data}$. The value of $D_G^*(x)$ is $1/2$ at $p_g=p_{data}$ and $C(G) = -log\,4$.

To see that this is the minimu possible value, consider the following modification to the $C(G)$ expression above,

The last term is the Jensen-Shannon divergence between two distributions which is always non-negative and zero only when the two distributions are equal. So $C^* = -log\,4$ is the global minimum of $C(G)$ at $p_g=p_{data}$, i.e. generative model perfectly replicating the data distribution.

Complexity Comparison of Generative Models

• There is no explicit representation of $p_g(x)$
• $G$ must be synchronized well with $D$ during training. There are possibilities of $D$ being too strong leading to zero gradient for $G$ or $D$ being too weak which causes $G$ to collapse to many values of $z$ to the same value of $x$ which would not have enough diversity to model $p_{data}$

Follow-up Citations

• RBMs and DBMs
• A fast learning algorithm for deep belief nets by Hinton et al.
• Deep boltzman machines by Salakhutdinov et al.
• Information processing in dynamical systems: Foundations of harmony theory by Smolensky
• MCMC
• Better mixing via deep representations by Bengio et al.
• Deep generative stochastic networks trainable by backprop by Bengio et al.
• Encodings
• What is the best multi-stage architecture for object recognition? by Jarett et al.
• Generalized denoising auto-encoders as generative models by Bengio et al.
• Deep sparse rectifier neural networks by Glorot et al.
• Maxout networks by Goodfellow et al.
• Optimizations
• Auto-encoding variational bayes by Kingma et al.
• Stochastic backpropagation and approximate inference in deep generative models by Rezende et al.
• Learning deep architectures for AI by Bengio Y.

· · ·