Skip to content

Generative Models

Models that learn to generate new data samples from a learned distribution. From GANs to diffusion models - the technology behind image generation, style transfer, and data augmentation.

GANs (Generative Adversarial Networks)

Two networks competing: - Generator G: creates fake samples from random noise - Discriminator D: distinguishes real from fake

Training: D tries to correctly classify, G tries to fool D. Adversarial game drives both to improve.

GAN Variants

Variant Key Idea
Conditional GAN Generator and discriminator conditioned on class label
CycleGAN Unpaired image-to-image translation (A->B and B->A)
StyleGAN Style-based generator for high-quality face synthesis
Pix2Pix Paired image-to-image translation

CycleGAN

Learns bidirectional mapping without paired data. Applications: age progression, style transfer, domain adaptation.

GAN Challenges

  • Mode collapse: generator produces limited variety
  • Training instability: oscillations, failure to converge
  • Catastrophic forgetting: learning new categories erases old ones
  • Fix: Generative Replay (replay old generated samples during training)
  • Fix: EWC (Elastic Weight Consolidation) - penalize changes to important weights

VAE (Variational Autoencoder)

Encoder maps input to latent distribution (mean + variance), sample from it, decoder reconstructs.

Loss = reconstruction loss + KL divergence (keeps latent space close to N(0,1))

Advantages over GAN: stable training, smooth latent space interpolation, explicit density model. Disadvantage: outputs tend to be blurrier than GANs.

Diffusion Models

Iteratively denoise from pure Gaussian noise to generate samples. Current state-of-the-art for image quality.

Forward process: gradually add noise to data over T steps. Reverse process: learn to denoise at each step. Neural network predicts noise to subtract.

Key models: DDPM, Stable Diffusion, DALL-E, Midjourney.

Advantages: superior sample quality, stable training, flexible conditioning. Disadvantage: slow generation (many denoising steps), though distillation methods help.

3D Generative

  • NeRF (Neural Radiance Fields): learn 3D scene from 2D images, render novel views
  • Point cloud generation: PointNet-based generative models
  • 3D-aware GANs: generate 3D-consistent images

Applications

  • Image generation: faces, art, product images
  • Data augmentation: generate training samples for rare classes
  • Style transfer: apply artistic style to photos
  • Super-resolution: upscale low-resolution images
  • Inpainting: fill missing regions in images
  • Text-to-image: generate images from text descriptions

Gotchas

  • GAN training requires careful hyperparameter tuning and monitoring
  • Generated images can contain artifacts (extra fingers, text distortion)
  • Evaluation is hard - FID score is standard but imperfect
  • Copyright and ethical concerns with training data
  • Diffusion models need significant GPU memory and time for generation

See Also

  • [[cnn-computer-vision]] - CNN architectures used in generators
  • [[neural-networks]] - training fundamentals
  • [[transfer-learning]] - fine-tuning generative models