### Introduction

The basic **adversarial** framework of the GAN architecture can be broken down into the following **two players**:

- A
**generative**model , that tries to capture the latent data distribution. - A
**discriminative**model , that estimates the probability that a sample came from training data rather than .

The framework is adversarial in the sense that the training procedure for tries to **maximize the probability of making a mistake**. The framework thus corresponds to a minimax two-player game.

### Related Generative Models

- Restricted Boltzmann Machines
- Deep Boltzmann Machines
- Deep Belief Networks
- Denoising Autoencoders
- Contractive Autoencoders
- Generative Stochastic Network

### Notations

- Easiest to implement GANs when the models are
**multilayer perceptrons**for both generator and discriminator. - is the generator’s distribution over data .
- is an input noise function and is the mapping to data space.
- is differentiable function represented by a paramter .
- is another differentiable function that outputs a scalar.
- represents the probability of assigning the correct label to both training examples and samples from .
- is simultaneously trained to minimize

### Optimization Objective

The training framework between and can be represented by a two player minimax game in value function ,

### Implementation Details

- and are trained iteratively one after the other
- is not optimized to completion as it would lead to overfitting
- Alternate between steps of optimizing and one step of
- Results in near its optimal, so long as changes slowly.
- Early in learning when is poor can reject samples with high confidence which causes to saturate
- Instead of minimizing , maximize for stronger gradients early in the learning.

### Theoretical Results

For a fixed , the optimal discriminator can be found by differentiating the objective function w.r.t. . The objective function is of the form,

Differentiating w.r.t $y$ gives,

Since we are maximising this, the maximum can be found by estimating the point of 0 derivative, i.e,

So the optimal discriminator for a fixed is given by,

For this maximized , the optimization objective can be rewritten as,

We can show that this expression is minimized for . The value of is at and .

To see that this is the minimu possible value, consider the following modification to the expression above,

The last term is the Jensen-Shannon divergence between two distributions which is always non-negative and zero only when the two distributions are equal. So is the global minimum of at , i.e. generative model perfectly replicating the data distribution.

### Complexity Comparison of Generative Models

### Disadvantages

- There is no explicit representation of
- must be synchronized well with during training. There are possibilities of being too strong leading to zero gradient for or being too weak which causes to collapse to many values of to the same value of which would not have enough diversity to model

### Follow-up Citations

- RBMs and DBMs
- A fast learning algorithm for deep belief nets by Hinton et al.
- Deep boltzman machines by Salakhutdinov et al.
- Information processing in dynamical systems: Foundations of harmony theory by Smolensky

- MCMC
- Better mixing via deep representations by Bengio et al.
- Deep generative stochastic networks trainable by backprop by Bengio et al.

- Encodings
- What is the best multi-stage architecture for object recognition? by Jarett et al.
- Generalized denoising auto-encoders as generative models by Bengio et al.
- Deep sparse rectifier neural networks by Glorot et al.
- Maxout networks by Goodfellow et al.

- Optimizations
- Auto-encoding variational bayes by Kingma et al.
- Stochastic backpropagation and approximate inference in deep generative models by Rezende et al.

- Learning deep architectures for AI by Bengio Y.