Skip to content

Text-to-LoRA (Sakana AI)

Hypernetwork that generates LoRA adapter weights from a natural language task description in a single forward pass. No training data needed for downstream tasks. Sub-second adapter generation.

Paper: arXiv:2506.06105 (June 2025, ICML 2025). Follow-up: Doc-to-LoRA (arXiv:2602.15902, Feb 2026). Authors: Sakana AI, Tokyo (Google Brain alumni).

Note: company is Sakana AI, not "Cakana". Founded by David Ha and Llion Jones.

Architecture

Task description (text) → Text encoder → embedding
Concat with: module-type embedding (q_proj, v_proj, etc.)
           + layer-index embedding (which transformer layer)
MLP blocks → LoRA matrices A and B
             (repeated for all target modules/layers)

Generated LoRAs: q_proj and v_proj, rank 8, ~3.4M adapter params.

Hypernetwork Sizes

Size Parameters
Large 55M
Medium 34M
Small 5M

Doc-to-LoRA extension: Perceiver-based (~309M params), consumes variable-length documents.

Training

Two approaches: 1. LoRA Reconstruction: train T2L to reconstruct pre-trained oracle LoRAs from task descriptions 2. SFT: end-to-end optimization through downstream task loss (better generalization)

Training data: 479 diverse tasks from Lots-of-LoRAs dataset. Scaling 16→479 tasks substantially improved zero-shot.

Supported Base Models

Mistral-7B (67%), Llama-3.1-8B (77%), Gemma-2-2b (66%).

Key Innovation

Eliminates training data + compute bottleneck for LoRA fine-tuning. Describe task in text → get working adapter instantly.

Even with random/meaningless descriptions, SFT-trained T2L generates "reasonable" LoRAs — it learns task patterns beyond just text.

Limitations

  • LLM only — does NOT generate LoRAs for image/video diffusion models
  • Fixed rank-8 LoRAs (bottleneck for complex tasks)
  • Meta-training expensive (days on multi-GPU), inference is sub-second
  • Out-of-distribution tasks remain difficult

License

Apache 2.0 — fully commercial.

Relevance for [[LoRA Fine-Tuning for Editing Models|Editing Model LoRAs]]

Currently LLM-only, but the principle of hypernetwork-generated LoRAs could extend to diffusion models. If adapted for [[MMDiT]], could enable instant task-specific editing adapters from text descriptions — e.g., "remove jewelry scratches" → LoRA weights.

  • GitHub: github.com/SakanaAI/text-to-lora
  • GitHub (D2L): github.com/SakanaAI/doc-to-lora