Flow Voice — Voice Cloning for OpenClaw

Clone any voice from a 3–30 second audio sample and generate speech from text. Powered by LuxTTS — 150x realtime, runs locally, fits in 1GB VRAM, works on CPU and Apple Silicon MPS. No API key, no cloud, no cost.

Output directory: ~/clawd/output/voice/

Commands

What you say	What it does
"clone this voice [audio file]"	Encode a voice profile from a sample
"speak as [name]: [text]"	Generate speech using a saved voice profile
"add voiceover to [video]: [text]"	Generate speech + bake into video with ffmpeg
"list voices"	Show saved voice profiles
"clone voice from URL [url]"	Download audio from URL, then clone

Workflow

Step 1: Clone a voice

uv run ~/clawd/skills/flow-voice/scripts/clone.py 
  --sample /path/to/sample.wav 
  --name "eric"

Saves encoded profile to ~/clawd/output/voice/profiles/eric.pkl. Requires at least 3 seconds of clean audio. 10–30 seconds is ideal.

Step 2: Generate speech

uv run ~/clawd/skills/flow-voice/scripts/speak.py 
  --voice "eric" 
  --text "Hello, this is a test of voice cloning." 
  --output ~/clawd/output/voice/output.wav

Outputs 48kHz WAV. Use --speed 1.0 to adjust pace.

Step 3: Bake into video (optional)

uv run ~/clawd/skills/flow-voice/scripts/speak.py 
  --voice "eric" 
  --text "Your agent can think. Now teach it to draw." 
  --output /tmp/vo.wav

ffmpeg -i input.mp4 -i /tmp/vo.wav 
  -c:v copy -c:a aac -shortest output_with_voice.mp4

One-Shot: Clone + Speak in one command

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py 
  --sample /path/to/sample.wav 
  --text "Beautiful diagrams, from a single prompt." 
  --output ~/clawd/output/voice/result.wav

No profile saving — just clone and speak immediately.

Bake voiceover directly into a video

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py 
  --sample /path/to/sample.wav 
  --text "Your agent can think. Now teach it to draw." 
  --video /path/to/animation.mp4 
  --output ~/clawd/output/voice/final_with_voice.mp4

Parameters

Flag	Default	Description
`--sample`	required	Reference audio file (wav/mp3, min 3s)
`--text`	required	Text to speak
`--output`	auto-named	Output file path
`--video`	none	If set, bakes audio into this video
`--voice`	none	Use saved profile instead of --sample
`--name`	none	Save cloned profile with this name
`--speed`	1.0	Speech speed (0.8 = slower, 1.2 = faster)
`--steps`	4	Inference steps (3–4 recommended)
`--t-shift`	0.9	Sampling param (higher = potentially better quality)
`--smooth`	false	Add smoothing (reduces metallic artifacts)
`--device`	auto	Force cpu / mps / cuda

Tips

Minimum 3 seconds of audio for cloning — 10–30s is ideal
If you hear metallic artifacts, add --smooth
For Apple Silicon (M1/M2/M3), device defaults to mps automatically
First run downloads the model (~200MB) to ~/.cache/huggingface/
Clean audio works best — no background music or noise in the reference sample

Examples

Clone Eric's voice from a recording:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py 
  --sample ~/recordings/eric-30s.wav 
  --name eric 
  --text "FlowStay is live. Book your room with AI." 
  --output ~/clawd/output/voice/flowstay-promo.wav

Add voiceover to a Flow Visual Explainer animation:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py 
  --voice eric 
  --text "Your agent can think. Now teach it to draw." 
  --video ~/clawd/2026-03-10-flowvisual-c3-magic-wand-comp.mp4 
  --output ~/clawd/output/voice/flowvisual-voiced.mp4

Quick one-shot from a downloaded audio clip:

yt-dlp -x --audio-format wav -o /tmp/ref.wav "https://www.instagram.com/reel/..."
uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py 
  --sample /tmp/ref.wav 
  --text "Hello from OpenClaw." 
  --output ~/clawd/output/voice/test.wav

Powered by LuxTTS (ysharma3501/LuxTTS, ZipVoice-based) — Free, local, no API key required. Packaged for OpenClaw by Flow — March 2026