Skip to main content

Voice Format

Provider.Model.VoiceId
Examples:
  • Telnyx.NaturalHD.astra
  • aws.Polly.Generative.Lucia
  • azure.en-US-AvaMultilingualNeural
  • elevenlabs.v3.Adam
Dots are allowed within model IDs. The voice parser handles multi-segment names like aws.Polly.Generative.Lucia correctly.

Provider Summary

ProviderKeyModelsAudio Delivery
TelnyxtelnyxNatural, NaturalHD, KokoroTTS, Qwen3TTSStreamed
AWS Pollyawsstandard, neural, generative, long-formConcatenated
Azure SpeechazureNeural voicesConcatenated
ElevenLabselevenlabsv2, v3, MultiPL.v2Direct relay
MinimaxminimaxStreamed
RimerimeArcanaV3Streamed
ResembleresembleTurbo (default)Streamed
Inworldinworldinworld-tts-1.5-mini, inworld-tts-1.5-maxStreamed
Streamed providers send audio in incremental frames — the audio field on the text-bearing chunk is null. Concatenated providers return full audio in a single chunk. Direct relay means frames are forwarded to the upstream provider’s WebSocket.
Telnyx Ultra is not available over WebSocket. Use the REST API for Ultra.

Telnyx

ModelDescriptionLanguages
NaturalFast, low-latency synthesisEnglish
NaturalHDHigher quality, supports multiple languagesen, fr, de, es, ar, hi, ja, he, pt
KokoroTTSLightweight model
Qwen3TTSVoice cloning. Requires a cloned voice name as voice_id.en, zh, fr, de, it, ja, ko, pt, ru, es
Voice IDs for Natural/NaturalHD correspond to pre-built voices. Browse available voices via the Voices API endpoint or the Voice Design Lab. Qwen3TTS voices require a voice clone created in the Voice Design Lab. The voice_id is the clone name. Cloned voice usage may require identity verification.

AWS Polly

Voice format: aws.Polly.<Engine>.<VoiceId> Engines: standard, neural, generative, long-form. Example: aws.Polly.Generative.Lucia Engine is parsed from the voice ID suffix (e.g., a voice ending in -longform maps to the long-form engine). Supports SSML input via text_type: "ssml" in voice settings. Voices: AWS Polly voice list

Azure Speech

Voice format: azure.<VoiceId> Example: azure.en-US-AvaMultilingualNeural Default voice: en-US-AvaMultilingualNeural. Default output format: audio-24khz-160kbitrate-mono-mp3. Supports SSML input and audio effects (eq_car, eq_telecomhp8k). Voices: Azure Speech voices

ElevenLabs

ElevenLabs connections are relayed directly to the ElevenLabs WebSocket API. Frames pass through without going through the standard text buffering pipeline.
Requires an ElevenLabs API key (configured in voice settings or account config). Voices: ElevenLabs voice library

Minimax

Supports voice cloning. Cloned voices are scoped to your organization. Voice settings: speed (float), vol (float), pitch (integer), language_boost (string).

Rime

Voice format: Rime.ArcanaV3.<VoiceId>

Resemble

Self-hosted synthesis engine. Voice settings: precision (PCM_16, PCM_24, PCM_32, MULAW), sample_rate (8000–48000), format (wav, mp3). Default model: Turbo. Default format: mp3.

Inworld

Models: inworld-tts-1.5-mini (faster), inworld-tts-1.5-max (higher quality). Aliases: Mini, Max. Encodings: MP3, LINEAR16. Default: LINEAR16 for WebSocket, MP3 for REST.

Voices API

List available voices:
GET https://api.telnyx.com/v2/text-to-speech/voices
Filter by provider:
GET https://api.telnyx.com/v2/text-to-speech/voices?provider=telnyx
Get a specific voice:
GET https://api.telnyx.com/v2/text-to-speech/voices?voice_id=Telnyx.NaturalHD.astra