Product Demo Video Creator

Create polished demo videos with voiceover and text overlays — fully automated, zero cost.

Stack

Tool	Purpose	Install
Puppeteer	Headless browser recording	`npm i -g puppeteer`
edge-tts	Microsoft Neural TTS (free)	`pip3 install edge-tts`
PIL/Pillow	Text overlays on frames	`pip3 install Pillow` (usually pre-installed)
FFmpeg	Video encoding	Static build from johnvansickle.com or package manager
Chromium	Browser engine	System package

Quick Start

Install dependencies (see scripts/install-deps.sh)
Define scenes in scripts/record-demo.mjs (copy template, customize)
Run: node scripts/record-demo.mjs
Output: MP4 with voiceover + text overlays

Workflow

Define Scenes → Generate Voiceover → Record Browser → Add Text Overlays → Compile Video
     ↓               ↓                    ↓                  ↓                ↓
  scenes[]      edge-tts MP3      Puppeteer frames     PIL drawtext      FFmpeg concat

Step 1: Define Scenes

Each scene has: id, title, subtitle, narration, url, type, actions.

{
  id: 'json',
  title: 'JSON Formatter',
  subtitle: 'Paste messy JSON, get formatted output',
  narration: 'Paste any messy JSON and get it formatted instantly.',
  url: 'https://example.com/json-formatter/',
  type: 'tool',       // 'intro' | 'tool' | 'outro'
  actions: async (page) => {
    await reactSetValue(page, 'textarea', '{"key":"value"}');
    await wait(1000);
    await clickButton(page, ['Format']);
    await wait(2000);
  }
}

Step 2: Scene Types and Overlay Positioning

Type	Overlay Position	Use For
`intro`	Bottom bar (centered, large title)	Opening scene with product name
`tool`	Bottom bar (left-aligned title + right badge)	Individual feature demos
`outro`	Bottom bar (centered CTA)	Closing with call-to-action

Critical: Use bottom bar, not top bar — top overlays conflict with website navigation.

Step 3: React App Interaction

React controlled components ignore direct .value assignment. Use reactSetValue():

async function reactSetValue(page, selector, value) {
  await page.evaluate((sel, val) => {
    const el = document.querySelector(sel);
    const setter = Object.getOwnPropertyDescriptor(
      el.tagName === 'TEXTAREA'
        ? window.HTMLTextAreaElement.prototype
        : window.HTMLInputElement.prototype, 'value'
    )?.set;
    if (setter) setter.call(el, val);
    else el.value = val;
    el.dispatchEvent(new Event('input', { bubbles: true }));
    el.dispatchEvent(new Event('change', { bubbles: true }));
  }, selector, value);
}

Step 4: Voiceover

edge-tts --voice en-US-AndrewNeural --rate=+5% --text "Your narration" --write-media output.mp3

Recommended voices: - en-US-AndrewNeural — Male, warm, confident (best for product demos) - en-US-AriaNeural — Female, positive, confident - en-US-BrianNeural — Male, approachable, casual

Step 5: Text Overlays

PIL adds text to frames. Key design rules: - Solid dark background (alpha ≥ 230) — semi-transparent looks cheap - Green accent line (2px) separating bar from content - "100% Client-Side" or equivalent badge in green (#4ade80) - Font: Noto Sans Bold for titles, Regular for subtitles

Step 6: Video Compilation

# Frames to video per scene
ffmpeg -framerate 6 -i frames/frame_%05d.png -c:v libx264 -preset slow -crf 20 -pix_fmt yuv420p -r 24 scene.mp4

# Add audio
ffmpeg -i scene.mp4 -i narration.mp3 -c:v copy -c:a aac -b:a 128k -shortest scene_final.mp4

# Normalize for concatenation
ffmpeg -i scene_final.mp4 -c:v libx264 -preset slow -crf 20 -r 24 -c:a aac -b:a 128k -ar 44100 -ac 2 scene_norm.mp4

# Concatenate
ffmpeg -f concat -safe 0 -i concat.txt -c copy output.mp4

Timing Guidelines

Action	Wait Time
Page load	600ms
After setting input value	800-1000ms
After clicking action button	2000-2500ms
Between password generations	1500ms
QR code / image rendering	2500ms
Show final result	2000-3000ms

Scene duration = max(8s, ceil(audio_duration) + 2s)

Capture Settings

Setting	Value	Notes
Resolution	1280×720	Standard for PH/social
Capture FPS	6	Balance quality/performance
Output FPS	24	Smooth playback
CRF	20	Good quality
JPEG quality	N/A (PNG frames)	Lossless capture

Troubleshooting

Empty tool outputs: React apps need reactSetValue(), not .value =
Dropdown menus open: Click body after interactions to dismiss
FFmpeg "No such filter: drawtext": Static FFmpeg builds lack it — use PIL instead
edge-tts fails: Check network; it calls Microsoft servers
Chromium won't start: Need --no-sandbox --disable-gpu --disable-dev-shm-usage

References

references/demo-planning.md — Demo structure, pacing, what makes demos compelling
scripts/record-demo.mjs — Complete working template (customize scenes for your app)
scripts/overlay.py — PIL text overlay processor
scripts/install-deps.sh — One-command dependency setup

product-demo-video

Installation