One-line definition
A GAN (Goodfellow et al., 2014) trains two networks adversarially: a generator that maps noise to samples , and a discriminator that tries to distinguish from real samples . The minimax objective:
Why it matters
GANs produced the sharpest, most realistic image samples of the deep learning era from 2015 to 2021, peaking with StyleGAN3 and BigGAN. They have largely been displaced by diffusion for image generation in 2026, but remain relevant in:
- Real-time / latency-critical generation (single forward pass vs diffusion’s iterative).
- Image-to-image translation (CycleGAN, pix2pix).
- Specialized domains (medical, super-resolution).
- As a discriminator-style critic in other systems (perceptual losses, adversarial robustness).
Knowing GAN training dynamics is also key to understanding why diffusion’s stable training is such an advantage.
The two players
- Generator : maps noise to a fake sample. Trained to fool .
- Discriminator : binary classifier distinguishing real from fake. Trained to maximize correct classification.
At equilibrium (Nash), produces samples indistinguishable from real, and outputs everywhere.
Why training is hard
The minimax game is unstable for many reasons:
- Mode collapse: finds one or a few outputs that consistently fool and ignores the rest of the distribution.
- Vanishing gradients: when is much better than , . No learning signal.
- Non-convergence: minimax dynamics can cycle without converging to equilibrium.
- Sensitivity to architecture and hyperparameters: small changes make a working GAN diverge.
A decade of research produced many stabilization techniques: spectral normalization, two-time-scale updates (TTUR), gradient penalty, WGAN/WGAN-GP, R1 regularization, progressive growing, StyleGAN’s mapping network. Each helps; none fully solves it.
Variants
| GAN | Innovation |
|---|---|
| DCGAN (Radford 2015) | Convolutional architecture for images |
| WGAN (Arjovsky 2017) | Wasserstein loss; weight clipping or gradient penalty (WGAN-GP) for Lipschitz constraint |
| Conditional GAN | Add class label or text embedding to both and |
| pix2pix, CycleGAN | Image-to-image translation (paired and unpaired) |
| BigGAN (Brock 2018) | Class-conditional ImageNet generation at scale |
| StyleGAN 1/2/3 (Karras 2018-2021) | Mapping network + AdaIN + alias-free design; SoTA face generation |
Why diffusion replaced GANs for image generation
| Property | GAN | Diffusion |
|---|---|---|
| Training stability | Notoriously unstable | Stable |
| Sample quality (FID) | Excellent | Excellent (better at scale) |
| Mode coverage | Mode collapse risk | Better coverage |
| Sample speed | One forward pass (fast) | Many denoising steps (slow) |
| Likelihood | None | Variational lower bound |
| Conditioning flexibility | Limited | Cross-attention conditioning is strong |
Diffusion’s training stability and conditioning flexibility (especially text-to-image with classifier-free guidance) tipped the balance.
Where GANs still win in 2026
- Real-time generation: single forward pass beats diffusion’s tens of steps.
- Style transfer / image-to-image: CycleGAN-style pipelines remain strong.
- Adversarial training as a regularizer: not for generation per se, but as a critic loss in distillation, super-resolution, domain adaptation.
Common pitfalls
- Treating GAN inception/FID scores as the only metric. They miss diversity issues; complement with precision/recall or coverage metrics.
- Not using spectral normalization or gradient penalty. Vanilla GAN training without modern stabilization is very fragile.
- Comparing GAN samples to diffusion samples at matched compute without matching steps. GAN: 1 forward pass; diffusion: 50–1000. Per sample, GAN is much cheaper.
- Reading “GANs are mode collapsed” as universal. Modern StyleGAN-class models cover ImageNet diversity well.
Related
- Autoregressive vs. diffusion. Broader generative paradigm map.
- Variational autoencoders. Earlier alternative.