Zvukogram TTS

Speech generation via Zvukogram API with SSML markup support.

Requirements

To use this skill, you need: - Zvukogram API token — get it at https://zvukogram.com/ - Zvukogram account email

Setup

Create file ~/.config/zvukogram/config.json:

mkdir -p ~/.config/zvukogram

{
  "token": "your_api_token_here",
  "email": "[email protected]"
}

Or use environment variables:

export ZVUKOGRAM_TOKEN=your_api_token_here
export [email protected]

Quick Start

# Simple TTS
python3 scripts/tts.py --text "Hello, world!" --voice Алена --output hello.mp3

# With +20% speed
python3 scripts/tts.py --text "Fast text" --voice Алена --speed 1.2 --output fast.mp3

# Check balance
python3 scripts/balance.py

Features

TTS generation — text to speech
SSML support — stress marks, pauses, speed
Audio merging — combine fragments via ffmpeg
Transcription — proper pronunciation of English words

SSML Markup

Stress Marks

Use + before stressed vowel:

З+амок — stress on "a"
зам+ок — stress on "o"

Aliases (Transcription)

<sub alias="Оупен Эй Ай">OpenAI</sub>
<sub alias="Самсунг">Samsung</sub>
<sub alias="Ал+ьтман">Альтман</sub>

Speed

<prosody rate="1.2">20% faster</prosody>
<prosody rate="fast">Fast text</prosody>

Pauses

<break time="500ms"/>

Available Voices

Алена — female, neutral (recommended)
Андрей — male, neutral (recommended)
Александра — female, soft
Антон — male, business

Full list: see references/VOICES.md

Examples

See references/EXAMPLES.md for: - Dialogs and podcasts - News voiceover - Voice notifications - Long texts

Transcription

See references/TRANSCRIPTION.md for proper pronunciation: - OpenAI → Оупен Эй Ай - GPT → Джи Пи Ти - Samsung → Самсунг - Altman → Ал+ьтман

SSML Reference

Full, agent-readable reference (recommended): references/SSML.md
Quick lookup: references/SSML_CHEATSHEET.md
Official Zvukogram SSML docs: https://zvukogram.com/node/ssml/

Troubleshooting

See references/TROUBLESHOOTING.md for: - API errors - Audio issues - Diagnostics

API Limitations

Max 1000 characters per request (/text)
Up to 1M characters via /longtext
Do not rely on <voice> / <speak> wrappers for API usage. For multi-voice, generate and merge fragments (one request per voice).

zvukogram

Installation