GANs: Generative Adversarial Networks
Introduced by Ian Goodfellow in 2014, Generative Adversarial Networks (GANs) are a class of machine learning frameworks where two neural networks contest with each other in a game. This framework allows the model to learn how to generate new, synthetic data that is indistinguishable from real data.
1. The Adversarial Concept: The Forger and the Detective
A GAN consists of two distinct models that are trained simultaneously through competition:
- The Generator (): Think of this as a forger. Its goal is to create realistic images (or data) from random noise to trick the discriminator.
- The Discriminator (): Think of this as a detective. Its goal is to distinguish between "real" data (from the training set) and "fake" data (produced by the generator).
2. The Training Process: A Zero-Sum Game
The GAN training process is a "minimax" game where the Generator tries to minimize the probability that the Discriminator is correct, while the Discriminator tries to maximize it.
- The Generator takes random noise as input and produces a synthetic sample.
- The Discriminator receives both real samples and synthetic samples.
- Feedback Loop: * If the Detective (D) catches the Forger (G), G learns how to improve its forgery.
- If the Forger (G) tricks the Detective (D), D learns how to be a better investigator.
Eventually, the Generator becomes so good that the Discriminator can only guess with 50% accuracy (equivalent to a coin flip).
3. Mathematical Objective
The entire system can be described by the following value function :
- : Discriminator's estimate of the probability that real data is real.
- : The Generator's output for a given noise .
- : Discriminator's estimate of the probability that a fake sample is real.
4. Architectural Flow (Mermaid)
The following diagram illustrates the interaction between the two networks and the data sources.
5. Challenges in Training GANs
Training GANs is notoriously difficult because of the delicate balance required between the two models:
- Mode Collapse: The Generator discovers a single "type" of output that tricks the Discriminator and keeps producing only that (e.g., a model supposed to generate all digits only generates the number "7").
- Vanishing Gradients: If the Discriminator is too good, the Generator doesn't get enough feedback to learn.
- Convergence: Unlike standard models, GANs may never reach a stable point, instead oscillating back and forth.
6. Popular GAN Variants
| Variant | Key Feature | Use Case |
|---|---|---|
| DCGAN | Uses Convolutional layers instead of Dense layers. | Generating high-quality images. |
| CycleGAN | Learns to translate images from one domain to another without paired data. | Turning photos into paintings (e.g., Zebra to Horse). |
| StyleGAN | Allows control over specific "styles" (age, hair color, etc.). | Generating hyper-realistic human faces. |
| Pix2Pix | Conditional GAN for image-to-image translation. | Converting sketches into realistic photos. |
7. Implementation Sketch (PyTorch)
import torch
import torch.nn as nn
# Simple Discriminator
discriminator = nn.Sequential(
nn.Linear(784, 128),
nn.LeakyReLU(0.2),
nn.Linear(128, 1),
nn.Sigmoid()
)
# Simple Generator
generator = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(),
nn.Linear(256, 784),
nn.Tanh() # Outputs pixels between -1 and 1
)
References
- Original Paper: Generative Adversarial Networks (Goodfellow et al.)
- Google Developers: GANs Course
- This Person Does Not Exist: A showcase of StyleGAN capabilities
GANs are masters of generation, but they are hard to control. What if we wanted a model that can gradually "denoise" an image into existence?