Skip to main content

WebSocket Streaming

Real-time streaming. Send text, receive audio chunks. Sentence-level buffering for LLM token streaming.

REST API

HTTP POST for batch synthesis. Binary, base64, or async output. Includes Ultra model.

In-Call Playback

TTS during live calls via Call Control speak command or TeXML <Say>.

Models

ModelWebSocketRESTDescription
NaturalYesYesFast, English-only
NaturalHDYesYesHigher quality, multilingual
UltraNoYesSub-100ms latency, 44 languages, emotion/speed/volume controls
Qwen3TTSYesYesVoice cloning
KokoroTTSYesYesLightweight
Additional providers: AWS Polly, Azure Speech, ElevenLabs, Minimax, Rime, Resemble, Inworld.

Voice Format

Voices use Provider.Model.VoiceId format:
Telnyx.NaturalHD.astra
Telnyx.Ultra.aura
aws.Polly.Generative.Lucia
azure.en-US-AvaMultilingualNeural
Browse voices via the Voices API or Voice Design Lab.

Key Behaviors

  • Text buffering — WebSocket accumulates text and synthesizes at sentence boundaries. Use flush: true to force immediate synthesis.
  • Markdown stripping — Headers, bold, code blocks, links, emoji automatically stripped. Safe for LLM output.
  • Audio caching — Identical requests return cached audio. Disable with disable_cache: true.
  • Pronunciation dictionaries — Custom word replacements applied before synthesis.
  • OpenAI SDK compatible — Use the OpenAI Audio API with Telnyx as base URL.