SkillHub

text2speech

v1.0.0

SenseAudio Text-to-Speech (TTS) API for converting text to natural speech. Supports synchronous and SSE streaming modes, multiple voices, emotion control, speed/pitch/volume adjustment, and multi-language (Chinese/English).

Sourced from ClawHub, Authored by scikkk

Installation

Please help me install the skill `text2speech` from SkillHub official store. npx skills add scikkk/text2speech

SenseAudio Text-to-Speech (TTS)

SenseAudio TTS converts text to natural, emotionally rich speech using a large language model. Supports 10+ emotions, streaming output (SSE), and fine-grained voice control.

Endpoint: POST https://api.senseaudio.cn/v1/t2a_v2 Auth: Authorization: Bearer $SENSEAUDIO_API_KEY Max text length: 10,000 characters


Request Parameters

Headers

Header Required Value
Authorization yes Bearer YOUR_API_KEY
Content-Type yes application/json

Body

Parameter Type Required Description
model string yes SenseAudio-TTS-1.0
text string yes Text to synthesize. Supports <break time=500> pause tags
stream boolean yes false = sync response; true = SSE streaming
voice_setting object yes Voice configuration (see below)
audio_setting object no Audio format configuration (see below)
dictionary array no Polyphonic character corrections (cloned voices + TTS-1.5 only)

voice_setting

Parameter Type Default Range Description
voice_id string - - Voice ID (system or cloned)
speed float 1.0 [0.5, 2.0] Speech speed
vol float 1.0 [0, 10] Volume
pitch int 0 [-12, 12] Pitch adjustment
latex_read boolean false - Read LaTeX/MathML formulas aloud

audio_setting

Parameter Type Default Options
format string mp3 mp3, wav, pcm, flac
sample_rate int 32000 8000, 16000, 22050, 24000, 32000, 44100
bitrate int 128000 32000, 64000, 128000, 256000 (MP3 only)
channel int 2 1 (mono), 2 (stereo)

<break> Pause Tag

Insert pauses in text:

你好<break time=500>欢迎使用我们的服务
  • time unit: milliseconds, min 100ms

Non-Streaming Response

{
  "data": {
    "audio": "hex-encoded audio data...",
    "status": 2
  },
  "extra_info": {
    "audio_length": 3500,
    "audio_sample_rate": 32000,
    "audio_size": 56000,
    "bitrate": 128000,
    "audio_format": "mp3",
    "audio_channel": 1,
    "word_count": 24,
    "usage_characters": 30
  },
  "base_resp": {"status_code": 0, "status_msg": "success"}
}

data.audio is hex-encoded. Decode: bytes.fromhex(audio_hex)


SSE Streaming Response

Each chunk: data: {"data":{"audio":"hex...","status":1},...}

Final chunk has status: 2 and includes extra_info.


Code Examples

curl (non-streaming)

curl -X POST https://api.senseaudio.cn/v1/t2a_v2 
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{
    "model": "SenseAudio-TTS-1.0",
    "text": "道可道,非常道。名可名,非常名。",
    "stream": false,
    "voice_setting": {"voice_id": "male_0004_a"}
  }' -o response.json

jq -r '.data.audio' response.json | xxd -r -p > output.mp3

Python (non-streaming)

import requests

resp = requests.post(
    "https://api.senseaudio.cn/v1/t2a_v2",
    headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"},
    json={
        "model": "SenseAudio-TTS-1.0",
        "text": "道可道,非常道。",
        "stream": False,
        "voice_setting": {"voice_id": "male_0004_a"}
    }
)
result = resp.json()
audio_bytes = bytes.fromhex(result["data"]["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio_bytes)

Python (SSE streaming)

import requests, json

with requests.post(
    "https://api.senseaudio.cn/v1/t2a_v2",
    headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"},
    json={"model": "SenseAudio-TTS-1.0", "text": "这是流式输出示例。", "stream": True,
          "voice_setting": {"voice_id": "male_0004_a"}},
    stream=True
) as r:
    with open("output.mp3", "wb") as f:
        for line in r.iter_lines():
            if line:
                line_str = line.decode("utf-8")
                if line_str.startswith("data: "):
                    chunk = json.loads(line_str[6:])
                    if chunk.get("data", {}).get("audio"):
                        f.write(bytes.fromhex(chunk["data"]["audio"]))


SenseAudio 文本转语音(TTS)

SenseAudio TTS 基于千亿参数大模型,将文字转化为自然流畅、情感丰富的语音。支持 10+ 种情感、流式输出(SSE)及精细化语音控制。

接口地址: POST https://api.senseaudio.cn/v1/t2a_v2 鉴权: Authorization: Bearer $SENSEAUDIO_API_KEY 最大文本长度: 10,000 字符


请求参数

请求头

参数名 必填 说明
Authorization Bearer YOUR_API_KEY
Content-Type application/json

请求体

参数名 类型 必填 说明
model string SenseAudio-TTS-1.0
text string 待合成文本,支持 <break time=500> 停顿符
stream boolean false 同步;true SSE 流式
voice_setting object 音色设置(见下表)
audio_setting object 音频格式设置(见下表)
dictionary array 多音字纠正(仅克隆音色 + TTS-1.5)

voice_setting(音色设置)

参数名 类型 默认值 范围 说明
voice_id string - - 音色 ID(系统音色或克隆音色)
speed float 1.0 [0.5, 2.0] 语速
vol float 1.0 [0, 10] 音量
pitch int 0 [-12, 12] 音调
latex_read boolean false - 数学公式朗读

audio_setting(音频设置)

参数名 类型 默认值 选项
format string mp3 mp3, wav, pcm, flac
sample_rate int 32000 8000/16000/22050/24000/32000/44100
bitrate int 128000 32000/64000/128000/256000(仅 MP3)
channel int 2 1(单声道), 2(双声道)

<break> 停顿符

在文本中插入停顿:

你好<break time=500>欢迎使用我们的服务
  • time 单位为毫秒,最小值 100ms

非流式响应

{
  "data": {"audio": "hex编码音频...", "status": 2},
  "extra_info": {
    "audio_length": 3500,
    "audio_sample_rate": 32000,
    "audio_size": 56000,
    "audio_format": "mp3",
    "word_count": 24,
    "usage_characters": 30
  },
  "base_resp": {"status_code": 0, "status_msg": "success"}
}

data.audio 为 hex 编码,解码:bytes.fromhex(audio_hex)


SSE 流式响应

每个数据块:data: {"data":{"audio":"hex...","status":1},...}

最后一个 chunk status: 2,包含完整 extra_info


代码示例

curl(非流式)

curl -X POST https://api.senseaudio.cn/v1/t2a_v2 
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{
    "model": "SenseAudio-TTS-1.0",
    "text": "道可道,非常道。名可名,非常名。",
    "stream": false,
    "voice_setting": {"voice_id": "male_0004_a"}
  }' -o response.json

jq -r '.data.audio' response.json | xxd -r -p > output.mp3

Python(非流式)

import requests

resp = requests.post(
    "https://api.senseaudio.cn/v1/t2a_v2",
    headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"},
    json={
        "model": "SenseAudio-TTS-1.0",
        "text": "道可道,非常道。",
        "stream": False,
        "voice_setting": {"voice_id": "male_0004_a"}
    }
)
audio_bytes = bytes.fromhex(resp.json()["data"]["audio"])
open("output.mp3", "wb").write(audio_bytes)