Voice Format
Telnyx
| Model | WebSocket | REST | Description |
|---|---|---|---|
| Natural | Yes | Yes | Fast, English-only |
| NaturalHD | Yes | Yes | Higher quality, multilingual (en, fr, de, es, ar, hi, ja, he, pt) |
| Ultra | No | Yes | Sub-100ms latency, 44 languages, emotion/speed/volume control |
| KokoroTTS | Yes | Yes | Lightweight synthesis |
| Qwen3TTS | Yes | Yes | Voice cloning (en, zh, fr, de, it, ja, ko, pt, ru, es) |
Ultra
Ultra supports 44 languages and expressive controls:| Parameter | Type | Range | Description |
|---|---|---|---|
emotion | string | — | Emotion tag: neutral, happy, sad, angry, etc. |
speed | float | 0.5–2.0 | Speech rate (1.0 = normal) |
volume | float | 0.0–2.0 | Volume level (1.0 = normal) |
language parameter to force a specific language (ISO 639-1 code or full name).
Qwen3TTS
Requires a voice clone created via the Voice Design Lab. Thevoice_id is the clone name.
Cloned voice usage requires L2 identity verification or the voice.cloned_voice_usage: allowed capability on your organization.
AWS Polly
Voice format:aws.Polly.<Engine>.<VoiceId>
Engines: standard, neural, generative, long-form.
Supports SSML via text_type: "ssml".
Azure Speech
Voice format:azure.<VoiceId>
Supports SSML and audio effects (eq_car, eq_telecomhp8k).
ElevenLabs
Requires ElevenLabs API key. Models:v2, v3, MultiPL.v2.
Minimax
Supports system voices and organization-scoped voice clones.Rime
Model:ArcanaV3.
Resemble
Self-hosted synthesis. Model:Turbo (default). Supports wav and mp3.
Inworld
Models:inworld-tts-1.5-mini (faster), inworld-tts-1.5-max (higher quality).
OpenAI SDK Mapping
OpenAI model | Telnyx Provider/Model |
|---|---|
tts-1 | Telnyx.NaturalHD |
tts-1-hd | Telnyx.NaturalHD |
gpt-4o-mini-tts | Telnyx.Ultra |