Skip to content

AI-Powered Video Production

Full pipeline for creating commercial video using AI tools. Chains LLM scripting, image generation, video synthesis, and audio generation.

Tool Chain

Tool Role Best For
GPT-5 Script, prompts, iteration One project per video, one chat per session
Seedream 4 Image generation from scratch Photorealistic people/scenes
NanoBanana Pro Edit/stylize existing frames Reference reproduction, geometry preservation
Kling 2.5 Video generation Draft at 720p (free), final at 1080p
ElevenLabs Voice generation Voiceovers, character speech
Suno Music generation Background music

Pipeline Stages

1. Script & Storyboard

  • Brief GPT with concept, characters, setting, visual references
  • Structure into ~6 scenes, ~25 sec total
  • Assign shot types: wide / medium / close-up / detail
  • Never repeat same shot type consecutively
  • Account for AI limitations: far shots render faces poorly

2. Still Frame Generation

  • Seedream: Fashion Photo style, 16:9, Unlimited Mode
  • Disable "AI Prompt / Improve short prompts"
  • Generate without references first, then refine with references
  • Edit with NanoBanana for geometry-preserving corrections

3. Draft Animatic

  • Kling 2.5 at 720p Unlimited (0 credits) for testing
  • Review: movement speed, proportions, background consistency
  • Fix via re-generation with refined prompts

4. Video Prompt Structure

1. Scene description: "wide shot of a lone man walking..."
2. Character movement: "takes 2-3 small steps then stops"
3. Camera movement: "camera slowly dollies forward, no shaking"
4. Atmosphere: "cold, dramatic grading, blue haze"
5. Negative: "no extra limbs, no face blur, no acid colors"

5. Audio & Lipsync

  • ElevenLabs for voiceover generation
  • Lipsync in Higgsfield (video + audio alignment)
  • English lipsync more stable than other languages
  • Duration matching critical: video and audio must align

Seedream vs NanoBanana

Aspect Seedream 4 NanoBanana Pro
From scratch Excellent Good
Reference reproduction Poor Excellent
Close-up details Contrasty, hard Soft, realistic
Text/branding Poor Much better

Credit Management

  • All stills and experiments: Unlimited mode (free)
  • Draft videos: 720p Unlimited (0 credits)
  • Final generation only: 1080p (500 credits/gen, 3-5 attempts per scene)

Gotchas

  • Long "poetic" prompts confuse video models - keep structured and specific
  • Always specify camera behavior (default = random "flying")
  • Far from camera = worse face quality - use separate full-body reference
  • Color grading: override GPT's default "golden hour" for dramatic looks
  • Teeth/tongue/mouth = main artifact zones in lipsync

See Also