WebSocket Streaming
Real-time streaming. Send text, receive audio chunks. Sentence-level buffering for LLM token streaming.
REST API
HTTP POST for batch synthesis. Binary, base64, or async output. Includes Ultra model.
In-Call Playback
TTS during live calls via Call Control
speak command or TeXML <Say>.Models
| Model | WebSocket | REST | Description |
|---|---|---|---|
| Natural | Yes | Yes | Fast, English-only |
| NaturalHD | Yes | Yes | Higher quality, multilingual |
| Ultra | No | Yes | Sub-100ms latency, 44 languages, emotion/speed/volume controls |
| Qwen3TTS | Yes | Yes | Voice cloning |
| KokoroTTS | Yes | Yes | Lightweight |
Voice Format
Voices useProvider.Model.VoiceId format:
Key Behaviors
- Text buffering — WebSocket accumulates text and synthesizes at sentence boundaries. Use
flush: trueto force immediate synthesis. - Markdown stripping — Headers, bold, code blocks, links, emoji automatically stripped. Safe for LLM output.
- Audio caching — Identical requests return cached audio. Disable with
disable_cache: true. - Pronunciation dictionaries — Custom word replacements applied before synthesis.
- OpenAI SDK compatible — Use the OpenAI Audio API with Telnyx as base URL.