Image Restoration - Approaches and Models¶
Overview of image restoration approaches: from classical to diffusion-based. Key insight: diffusion models bring generalization across degradation types but at higher compute cost.
Classical / CNN-Based¶
SwinIR (ICCV 2021)¶
- Swin Transformer blocks for image restoration
- Strong baseline for denoise, super-res, JPEG artifact removal
- Lightweight (~12M params), fast inference
- Limitation: task-specific training, one model per degradation
NAFNet (ECCV 2022)¶
- "Nonlinear Activation Free Network" - removes GELU/Softmax
- SOTA on SIDD (40.30 dB PSNR) and GoPro deblurring
- Very efficient: ~67M params, simple architecture
- Uses SimpleGate + Simplified Channel Attention
Restormer (CVPR 2022)¶
- Multi-scale Transformer for high-res restoration
- Transposed attention: key/value along channel dim (not spatial)
- Strong on real noise removal, motion deblur, defocus deblur
Diffusion-Based¶
[[RealRestorer]] (March 2026)¶
- 9 degradation types on [[Step1X-Edit]] backbone
- Prompt-driven: specify degradation type in text
-
1 open-source on RealIR-Bench (FS=0.146), close to GPT-Image-1.5¶
- ~34 GB VRAM, 28 steps
- Weights: non-commercial academic only
Palette (Google, 2022)¶
- First diffusion model for image-to-image restoration
- Concatenates degraded image with noise as conditioning
- Showed diffusion can match/beat task-specific models
IR-SDE (NeurIPS 2023)¶
- Treats restoration as SDE reverse process
- Mean-reverting SDE: starts from degraded image, not pure noise
- Better than starting from noise for restoration tasks
Degradation Types¶
| Type | Classical SOTA | Diffusion SOTA |
|---|---|---|
| Gaussian noise | NAFNet (40.3 dB) | RealRestorer |
| Real noise (SIDD) | NAFNet / Restormer | RealRestorer |
| JPEG artifacts | SwinIR | RealRestorer |
| Motion blur | NAFNet / Restormer | RealRestorer |
| Low light | RetinexNet / SNR-Net | RealRestorer |
| Rain removal | MPRNet | RealRestorer |
| Haze removal | DehazeFormer | RealRestorer |
| Super-resolution | SwinIR / Real-ESRGAN | StableSR |
| Moire | DMCNN | RealRestorer |
SANA-Denoiser Approach¶
Our approach: repurpose [[SANA]] 1.6B DiT as restoration model via [[Paired Training for Restoration]]: - Channel concat conditioning (degraded latent + noise) - [[DC-AE]] 32x compression keeps token count low - Linear attention O(N) enables high-res processing - [[Temporal Tiling]] for context-aware tile processing at 4K+
Advantages over RealRestorer: - 10x fewer params (1.6B vs ~15B Step1X-Edit backbone) - Linear attention vs quadratic (much faster at high-res) - 32x VAE compression vs 8x (4x fewer tokens)
Standard Benchmarks¶
| Benchmark | Images | Degradation | Notes |
|---|---|---|---|
| SIDD | 320 val patches | Real smartphone noise | Gold standard for denoising |
| DND | 50 images | Real camera noise | No GT, online submission |
| DIV2K | 100 val | Synthetic (for super-res) | 2K resolution |
| Urban100 | 100 | Synthetic | Repetitive structures |
| Set14 | 14 | Synthetic | Quick sanity check |
| RealIR-Bench | 464 | 9 real degradation types | RealRestorer benchmark |