SkillHub

clawvox

v1.0.0

ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.

Sourced from ClawHub, Authored by abhishek-official1

Installation

Please help me install the skill `clawvox` from SkillHub official store. npx skills add abhishek-official1/clawvox

ClawVox

Transform your OpenClaw assistant into a professional voice production studio with ClawVox - powered by ElevenLabs.

Quick Reference

Action Command Description
Speak {baseDir}/scripts/speak.sh 'text' Convert text to speech
Transcribe {baseDir}/scripts/transcribe.sh audio.mp3 Speech to text
Clone {baseDir}/scripts/clone.sh --name "Voice" sample.mp3 Clone a voice
SFX {baseDir}/scripts/sfx.sh "thunder storm" Generate sound effects
Voices {baseDir}/scripts/voices.sh list List available voices
Dub {baseDir}/scripts/dub.sh --target es audio.mp3 Translate audio
Isolate {baseDir}/scripts/isolate.sh audio.mp3 Remove background noise

Setup

  1. Get your API key from elevenlabs.io/app/settings/api-keys
  2. Configure in ~/.openclaw/openclaw.json:
{
  skills: {
    entries: {
      "clawvox": {
        apiKey: "YOUR_ELEVENLABS_API_KEY",
        config: {
          defaultVoice: "Rachel",
          defaultModel: "eleven_turbo_v2_5",
          outputDir: "~/.openclaw/audio"
        }
      }
    }
  }
}

Or set the environment variable:

export ELEVENLABS_API_KEY="your_api_key_here"

Voice Generation (TTS)

Basic Text-to-Speech

# Quick speak with default voice (Rachel)
{baseDir}/scripts/speak.sh 'Hello, I am your personal AI assistant.'

# Specify voice by name
{baseDir}/scripts/speak.sh --voice Adam 'Hello from Adam'

# Save to file
{baseDir}/scripts/speak.sh --out ~/audio/greeting.mp3 'Welcome to the show'

# Use specific model
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 'Bonjour'

# Adjust voice settings
{baseDir}/scripts/speak.sh --stability 0.5 --similarity 0.8 'Expressive speech'

# Adjust speed
{baseDir}/scripts/speak.sh --speed 1.2 'Faster speech'

# Use multilingual model for other languages
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Rachel 'Hola, que tal'
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Adam 'Guten Tag'

Voice Models

Model Latency Languages Best For
eleven_flash_v2_5 ~75ms 32 Real-time, streaming
eleven_turbo_v2_5 ~250ms 32 Balanced quality/speed
eleven_multilingual_v2 ~500ms 29 Long-form, highest quality

Available Voices

Premade voices: Rachel, Adam, Antoni, Bella, Domi, Elli, Josh, Sam, Callum, Charlie, George, Liam, Matilda, Alice, Bill, Brian, Chris, Daniel, Eric, Jessica, Laura, Lily, River, Roger, Sarah, Will

Long-Form Content

# Generate audio from text file
{baseDir}/scripts/speak.sh --input chapter.txt --voice "George" --out audiobook.mp3

Speech-to-Text (Transcription)

Basic Transcription

# Transcribe audio file
{baseDir}/scripts/transcribe.sh recording.mp3

# Save to file
{baseDir}/scripts/transcribe.sh --out transcript.txt audio.mp3

# Transcribe with language hint
{baseDir}/scripts/transcribe.sh --language es spanish_audio.mp3

# Include timestamps
{baseDir}/scripts/transcribe.sh --timestamps podcast.mp3

Supported Formats

  • MP3, MP4, MPEG, MPGA, M4A, WAV, WebM
  • Maximum file size: 100MB

Voice Cloning

Instant Voice Clone

# Clone from single sample (minimum 30 seconds recommended)
{baseDir}/scripts/clone.sh --name MyVoice recording.mp3

# Clone with description
{baseDir}/scripts/clone.sh --name BusinessVoice 
  --description 'Professional male voice' 
  sample.mp3

# Clone with labels
{baseDir}/scripts/clone.sh --name MyVoice 
  --labels '{"gender":"male","age":"adult"}' 
  sample.mp3

# Remove background noise during cloning
{baseDir}/scripts/clone.sh --name CleanVoice 
  --remove-bg-noise 
  sample.mp3

# Test cloned voice
{baseDir}/scripts/speak.sh --voice MyVoice 'Testing my cloned voice'

Voice Library Management

# List all available voices
{baseDir}/scripts/voices.sh list

# Get voice details
{baseDir}/scripts/voices.sh info --name Rachel
{baseDir}/scripts/voices.sh info --id 21m00Tcm4TlvDq8ikWAM

# Search voices (filter output with grep)
{baseDir}/scripts/voices.sh list | grep -i "female"

# Filter by category
{baseDir}/scripts/voices.sh list --category premade
{baseDir}/scripts/voices.sh list --category cloned

# Download voice preview
{baseDir}/scripts/voices.sh preview --name Rachel -o preview.mp3

# Delete custom voice
{baseDir}/scripts/voices.sh delete --id "voice_id"

Sound Effects

# Generate sound effect
{baseDir}/scripts/sfx.sh 'Heavy rain on a tin roof'

# With duration
{baseDir}/scripts/sfx.sh --duration 5 'Forest ambiance with birds'

# With prompt influence (higher = more accurate)
{baseDir}/scripts/sfx.sh --influence 0.8 'Sci-fi laser gun firing'

# Save to file
{baseDir}/scripts/sfx.sh --out effects/thunder.mp3 'Rolling thunder'

Note: Duration range is 0.5 to 22 seconds (rounded to nearest 0.5)

Voice Isolation

# Remove background noise and isolate voice
{baseDir}/scripts/isolate.sh noisy_recording.mp3

# Save to specific file
{baseDir}/scripts/isolate.sh --out clean_voice.mp3 meeting_recording.mp3

# Don't tag audio events
{baseDir}/scripts/isolate.sh --no-audio-events recording.mp3

Requirements: - Minimum duration: 4.6 seconds - Supported formats: MP3, WAV, M4A, OGG, FLAC

Dubbing (Multi-Language Translation)

# Dub audio to Spanish
{baseDir}/scripts/dub.sh --target es audio.mp3

# Dub with source language specified
{baseDir}/scripts/dub.sh --source en --target ja video.mp4

# Check dubbing status
{baseDir}/scripts/dub.sh --status --id "dubbing_id"

# Download dubbed audio
{baseDir}/scripts/dub.sh --download --id "dubbing_id" --out dubbed.mp3

Supported languages: en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, nl, ru, tr, vi, sv, da, fi, cs, el, he, id, ms, no, ro, uk, hu, th

API Usage Examples

For direct API access, all scripts use curl under the hood:

# Direct TTS API call
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID" 
  -H "xi-api-key: $ELEVENLABS_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{"text": "Hello world", "model_id": "eleven_turbo_v2_5"}' 
  --output speech.mp3

Error Handling

All scripts provide helpful error messages:

  • 401: Authentication failed - Check your API key
  • 403: Permission denied - Your API key may not have access
  • 429: Rate limit exceeded - Wait before trying again
  • 500/502/503: ElevenLabs API issues - Try again later

Testing

Run the test suite to verify everything works:

{baseDir}/test.sh YOUR_API_KEY

Or with environment variable:

export ELEVENLABS_API_KEY="your_key"
{baseDir}/test.sh

Troubleshooting

Common Issues

  1. "exec host not allowed (requested gateway)"
  2. The skill needs to run commands in a sandbox environment
  3. Configure OpenClaw to use sandbox: tools.exec.host: "sandbox"
  4. Or enable sandboxing in your OpenClaw config
  5. Alternative: Configure exec approvals for gateway host (see OpenClaw docs)

  6. Parse errors with quotes or exclamation marks

  7. Use single quotes instead of double quotes: 'Hello world' not "Hello world!"
  8. Avoid exclamation marks (!) in text when using double quotes
  9. For complex text, use the --input option with a file

  10. "ELEVENLABS_API_KEY not set"

  11. Ensure ELEVENLABS_API_KEY is set or configured in openclaw.json
  12. Check that the API key is at least 20 characters long

  13. "jq is required but not installed"

  14. Install jq: apt-get install jq (Linux) or brew install jq (macOS)

  15. "Rate limited"

  16. Check your ElevenLabs plan quota at elevenlabs.io/app/usage
  17. Free tier: ~10,000 characters/month

  18. "Voice not found"

  19. Use {baseDir}/scripts/voices.sh list to see available voices
  20. Check if the voice ID is correct

  21. "Dubbing failed"

  22. Ensure source audio is clear and audible
  23. Check supported language codes

  24. "File too large"

  25. Transcription: 100MB max
  26. Dubbing: 500MB max
  27. Voice cloning: 50MB per file

Debug Mode

# Enable verbose output
DEBUG=1 {baseDir}/scripts/speak.sh 'test'

# Show API request details
DEBUG=1 {baseDir}/scripts/transcribe.sh audio.mp3

Pricing Notes

ElevenLabs API pricing (approximate): - Flash v2.5: ~$0.06/min - Turbo v2.5: ~$0.06/min
- Multilingual v2: ~$0.12/min - Voice cloning: Included in plan - Sound effects: ~$0.02/generation - Transcription: ~$0.02/min (Scribe v1)

Free tier: ~10,000 characters/month

  • ElevenLabs Dashboard
  • API Documentation
  • Voice Library
  • Pricing