Skip to content
mentorship

concepts

Generative adversarial networks (GANs)

Two networks compete: a generator produces samples, a discriminator distinguishes them from real data. Sharp samples, training instability, mostly displaced by diffusion in 2026.

Reviewed · 3 min read

One-line definition

A GAN (Goodfellow et al., 2014) trains two networks adversarially: a generator that maps noise to samples , and a discriminator that tries to distinguish from real samples . The minimax objective:

Why it matters

GANs produced the sharpest, most realistic image samples of the deep learning era from 2015 to 2021, peaking with StyleGAN3 and BigGAN. They have largely been displaced by diffusion for image generation in 2026, but remain relevant in:

  • Real-time / latency-critical generation (single forward pass vs diffusion’s iterative).
  • Image-to-image translation (CycleGAN, pix2pix).
  • Specialized domains (medical, super-resolution).
  • As a discriminator-style critic in other systems (perceptual losses, adversarial robustness).

Knowing GAN training dynamics is also key to understanding why diffusion’s stable training is such an advantage.

The two players

  • Generator : maps noise to a fake sample. Trained to fool .
  • Discriminator : binary classifier distinguishing real from fake. Trained to maximize correct classification.

At equilibrium (Nash), produces samples indistinguishable from real, and outputs everywhere.

Why training is hard

The minimax game is unstable for many reasons:

  1. Mode collapse: finds one or a few outputs that consistently fool and ignores the rest of the distribution.
  2. Vanishing gradients: when is much better than , . No learning signal.
  3. Non-convergence: minimax dynamics can cycle without converging to equilibrium.
  4. Sensitivity to architecture and hyperparameters: small changes make a working GAN diverge.

A decade of research produced many stabilization techniques: spectral normalization, two-time-scale updates (TTUR), gradient penalty, WGAN/WGAN-GP, R1 regularization, progressive growing, StyleGAN’s mapping network. Each helps; none fully solves it.

Variants

GANInnovation
DCGAN (Radford 2015)Convolutional architecture for images
WGAN (Arjovsky 2017)Wasserstein loss; weight clipping or gradient penalty (WGAN-GP) for Lipschitz constraint
Conditional GANAdd class label or text embedding to both and
pix2pix, CycleGANImage-to-image translation (paired and unpaired)
BigGAN (Brock 2018)Class-conditional ImageNet generation at scale
StyleGAN 1/2/3 (Karras 2018-2021)Mapping network + AdaIN + alias-free design; SoTA face generation

Why diffusion replaced GANs for image generation

PropertyGANDiffusion
Training stabilityNotoriously unstableStable
Sample quality (FID)ExcellentExcellent (better at scale)
Mode coverageMode collapse riskBetter coverage
Sample speedOne forward pass (fast)Many denoising steps (slow)
LikelihoodNoneVariational lower bound
Conditioning flexibilityLimitedCross-attention conditioning is strong

Diffusion’s training stability and conditioning flexibility (especially text-to-image with classifier-free guidance) tipped the balance.

Where GANs still win in 2026

  • Real-time generation: single forward pass beats diffusion’s tens of steps.
  • Style transfer / image-to-image: CycleGAN-style pipelines remain strong.
  • Adversarial training as a regularizer: not for generation per se, but as a critic loss in distillation, super-resolution, domain adaptation.

Common pitfalls

  • Treating GAN inception/FID scores as the only metric. They miss diversity issues; complement with precision/recall or coverage metrics.
  • Not using spectral normalization or gradient penalty. Vanilla GAN training without modern stabilization is very fragile.
  • Comparing GAN samples to diffusion samples at matched compute without matching steps. GAN: 1 forward pass; diffusion: 50–1000. Per sample, GAN is much cheaper.
  • Reading “GANs are mode collapsed” as universal. Modern StyleGAN-class models cover ImageNet diversity well.