Stream text to speech over WebSocket

AsyncAPI specification for the Telnyx Text-to-Speech WebSocket endpoint. Real-time speech synthesis by streaming text and receiving audio chunks.

Supported Providers

telnyx - Telnyx native voices (Natural, NaturalHD, Qwen3TTS)
aws - Amazon Polly
azure - Microsoft Azure TTS
elevenlabs - ElevenLabs voices
minimax - MiniMax voices
rime - Rime voices
resemble - Resemble AI voices
xai - xAI voices (Eve, Ara, Rex, Sal, Leo)
inworld - Inworld AI voices
fishaudio - Fish Audio voices (s2.1-pro, s2-pro, s1 models)

Connection Flow

Open WebSocket connection to wss://api.telnyx.com/v2/text-to-speech/speech with query parameters.
Send an initial handshake message {"text": " "} (single space) with optional voice_settings.
Send text messages as {"text": "Hello world"}.
Receive audio chunks as JSON frames with base64-encoded audio.
A final frame with isFinal: true indicates the end of audio for the current text.

Authentication

Requires authentication via a Bearer token (Telnyx API v2 key).

WSS

text-to-speech

speech

Messages

bearerAuth

type:http

Telnyx API v2 Bearer token authentication.

query

type:object

Query parameters passed when opening the WebSocket connection.

Text Frame

type:object

Client-to-server frame containing text to synthesize. The initial handshake message should be {"text": " "} (single space) with optional voice_settings. Subsequent messages contain actual text. To interrupt synthesis mid-stream, send {"force": true}.

Audio Chunk Frame

type:object

Server-to-client frame containing a base64-encoded audio chunk. For providers that stream audio in real-time (Telnyx Natural/NaturalHD, Rime, Minimax, Resemble, Inworld, Fish Audio), text will be null because audio is streamed before full text alignment is available, and cached will be false. For other providers, text contains the corresponding text segment.

Final Frame

type:object

Server-to-client frame indicating synthesis is complete for the current text. The connection remains open for additional text messages.

Error Frame

type:object

Server-to-client frame indicating an error during synthesis. The connection will be closed shortly after sending this frame.

Text-to-Speech WebSocket streaming examples

REST API

⌘I

Messages

WebSocket Streaming

REST API

Providers

Other

API Reference

For AI Agents

Stream text to speech over WebSocket

Supported Providers

Connection Flow

Authentication