Telnyx

Overview

The Telnyx Text-to-Speech (TTS) WebSocket API provides real-time audio synthesis from text input. This streaming endpoint allows you to send text and receive synthesized audio incrementally, enabling low-latency voice generation for real-time applications.

WebSocket endpoint

Connection URL

wss://api.telnyx.com/v2/text-to-speech/speech?voice={voice_id}

Query parameters

Parameter	Type	Required	Description
`voice`	string	Yes	Voice identifier (e.g., `Telnyx.NaturalHD.astra`)
`inactivity_timeout`	integer	No	Time without message to keep the WebSocket open (default: 20 seconds)

Authentication

Include your Telnyx API key as an Authorization header in the connection request:

Authorization: Bearer YOUR_API_KEY

Example connection

import websockets

url = "wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}

websocket = await websockets.connect(url, extra_headers=headers)

Connection flow

The TTS WebSocket follows this lifecycle:

Connect - Establish WebSocket connection with authentication.
Initialize - Send initialization frame with space character.
Send Text - Send one or more text frames to synthesize.
Receive Audio - Receive audio frames with base64-encoded mp3 data.
Stop - Send empty text frame to signal completion.
Close - Connection closes after processing completes.

Flow diagram

Client                          Server
  |                               |
  |------- Connect -------------->|
  |<------ Connected -------------|
  |                               |
  |------- Init Frame ----------->|
  |       {"text": " "}           |
  |                               |
  |------- Text Frame ----------->|
  |       {"text": "Hello"}       |
  |                               |
  |<------ Audio Frame -----------|
  |       {"audio": "base64..."}  |
  |<------ Audio Frame -----------|
  |       {"audio": "base64..."}  |
  |                               |
  |------- Stop Frame ----------->|
  |       {"text": ""}            |
  |                               |
  |<------ Close -----------------|

Frame types

Outbound frames (Client → Server)

All outbound frames are JSON text messages with the following structure:

1. Initialization frame

Purpose: Initialize the TTS session Format:

{
  "text": " "
}

Example:

import json

init_frame = {"text": " "}
await websocket.send(json.dumps(init_frame))

Notes:

Must be sent first after connection.
Contains a single space character.
Required to begin the session.

2. Text frame

Purpose: Send text content to be synthesized into speech Format:

{
  "text": "Your text content here"
}

Example:

text_frame = {"text": "Hello, this is a test of the Telnyx TTS service."}
await websocket.send(json.dumps(text_frame))

Multiple Text Frames:

# You can send multiple text frames sequentially
frames = [
    {"text": "First sentence."},
    {"text": "Second sentence."},
    {"text": "Third sentence."}
]

for frame in frames:
    await websocket.send(json.dumps(frame))
    await asyncio.sleep(0.5)

Notes:

Can send multiple text frames in one session.
Each frame is processed and synthesized separately.
Audio is returned incrementally for each text frame.

3. Stop frame

Purpose: Signal completion of text input and end the session Format:

{
  "text": ""
}

Example:

stop_frame = {"text": ""}
await websocket.send(json.dumps(stop_frame))

Notes:

Contains an empty string.
Signals the server to finish processing.
Should be sent after all text frames.

Inbound frames (Server → Client)

The server sends JSON text messages containing synthesized audio data.

Audio frame

Purpose: Deliver synthesized audio data Format:

{
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA="
}

Processing Audio:

import base64

async for message in websocket:
    data = json.loads(message)

    if "audio" in data:
        # Decode base64 audio
        audio_bytes = base64.b64decode(data["audio"])

        # Save or process audio
        with open("output.mp3", "ab") as f:
            f.write(audio_bytes)

Audio Specifications:

Property	Value
Format	mp3
Sample Rate	16 kHz
Bit Depth	16-bit
Channels	Mono (1)
Encoding	Base64

Notes:

Multiple audio frames may be received for a single text input.
Each audio chunk is a complete mp3 file with headers.
Chunks should be concatenated in the order received.
Use append mode when saving to file to preserve all audio.

Complete example

Here’s a complete example showing all frame types in sequence:

import asyncio
import json
import base64
import websockets

async def tts_example():
    # 1. Connect to WebSocket
    url = "wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra"
    headers = {
        "Authorization": "Bearer YOUR_API_KEY"
    }

    async with websockets.connect(url, extra_headers=headers) as ws:
        print("Connected to TTS WebSocket")

        # 2. Send initialization frame
        init_frame = {"text": " "}
        await ws.send(json.dumps(init_frame))
        print("Sent: Initialization frame")

        # 3. Send text frame
        text_frame = {"text": "Hello, welcome to Telnyx Text-to-Speech streaming."}
        await ws.send(json.dumps(text_frame))
        print("Sent: Text frame")

        # 4. Receive audio frames
        audio_count = 0
        async for message in ws:
            data = json.loads(message)

            if "audio" in data:
                audio_count += 1
                audio_bytes = base64.b64decode(data["audio"])

                # Append audio chunks to file
                with open("output.mp3", "ab") as f:
                    f.write(audio_bytes)

                print(f"Received: Audio frame #{audio_count} ({len(audio_bytes)} bytes)")

                # After receiving audio, send stop frame
                if audio_count >= 10:  # Adjust based on your needs
                    # 5. Send stop frame
                    stop_frame = {"text": ""}
                    await ws.send(json.dumps(stop_frame))
                    print("Sent: Stop frame")

        print("Connection closed")

asyncio.run(tts_example())

Expected Output:

Connected to TTS WebSocket
Sent: Initialization frame
Sent: Text frame
Received: Audio frame #1 (8192 bytes)
Received: Audio frame #2 (6144 bytes)
Received: Audio frame #3 (4096 bytes)
Sent: Stop frame
Connection closed

Configuration summary

Required configuration

# WebSocket URL
ENDPOINT = "wss://api.telnyx.com/v2/text-to-speech/speech"
VOICE_ID = "Telnyx.NaturalHD.astra"
URL = f"{ENDPOINT}?voice={VOICE_ID}"

# Authentication Header
HEADERS = {
    "Authorization": "Bearer YOUR_API_KEY"
}

Message sequence

# 1. Initialization
{"text": " "}

# 2. Text to synthesize (can send multiple)
{"text": "Your text here"}

# 3. Stop signal
{"text": ""}

Demo project

A complete Python implementation is available on GitHub.

Video demo

Watch this demonstration to see the Telnyx Text-to-Speech in action:

Troubleshooting

Issue	Solution
Connection fails	Verify token format: `Bearer YOUR_API_KEY`
No audio received	Ensure initialization frame sent first
Audio is garbled	Check base64 decoding and file append mode
Empty audio file	Confirm text frame contains valid content

Guides

API Reference

Text-to-Speech WebSocket Streaming

Overview

WebSocket endpoint

Connection URL

Query parameters

Authentication

Example connection

Connection flow

Flow diagram

Frame types

Outbound frames (Client → Server)

1. Initialization frame

2. Text frame

3. Stop frame

Inbound frames (Server → Client)

Audio frame

Complete example

Configuration summary

Required configuration

Message sequence

Demo project

Video demo

Troubleshooting

Additional resources

Guides

API Reference

​Overview

​WebSocket endpoint

​Connection URL

​Query parameters

​Authentication

​Example connection

​Connection flow

​Flow diagram

​Frame types

​Outbound frames (Client → Server)

​1. Initialization frame

​2. Text frame

​3. Stop frame

​Inbound frames (Server → Client)

​Audio frame

​Complete example

​Configuration summary

​Required configuration

​Message sequence

​Demo project

​Video demo

​Troubleshooting

​Additional resources

Overview

WebSocket endpoint

Connection URL

Query parameters

Authentication

Example connection

Connection flow

Flow diagram

Frame types

Outbound frames (Client → Server)

1. Initialization frame

2. Text frame

3. Stop frame

Inbound frames (Server → Client)

Audio frame

Complete example

Configuration summary

Required configuration

Message sequence

Demo project

Video demo

Troubleshooting

Additional resources