Telnyx

Overview

The Telnyx Text-to-Speech (TTS) WebSocket API provides real-time audio synthesis from text input. This streaming endpoint allows you to send text and receive synthesized audio incrementally, enabling low-latency voice generation for real-time applications.

WebSocket Endpoint

Connection URL

wss://api.telnyx.com/v2/text-to-speech/speech?voice={voice_id}

Query Parameters

Parameter	Type	Required	Description
`voice`	string	Yes	Voice identifier (e.g., `Telnyx.NaturalHD.astra`)
`inactivity_timeout`	integer	No	Time without message to keep the WebSocket open (default: 20 seconds)

Authentication

Include your Telnyx API token as an Authorization header in the connection request:

Authorization: Bearer YOUR_TELNYX_TOKEN

Example Connection

import websockets

url = "wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra"
headers = {
    "Authorization": "Bearer YOUR_TELNYX_TOKEN"
}

websocket = await websockets.connect(url, extra_headers=headers)

Connection Flow

The TTS WebSocket follows this lifecycle:

Connect - Establish WebSocket connection with authentication.
Initialize - Send initialization frame with space character.
Send Text - Send one or more text frames to synthesize.
Receive Audio - Receive audio frames with base64-encoded mp3 data.
Stop - Send empty text frame to signal completion.
Close - Connection closes after processing completes.

Flow Diagram

Client                          Server
  |                               |
  |------- Connect -------------->|
  |<------ Connected -------------|
  |                               |
  |------- Init Frame ----------->|
  |       {"text": " "}           |
  |                               |
  |------- Text Frame ----------->|
  |       {"text": "Hello"}       |
  |                               |
  |<------ Audio Frame -----------|
  |       {"audio": "base64..."}  |
  |<------ Audio Frame -----------|
  |       {"audio": "base64..."}  |
  |                               |
  |------- Stop Frame ----------->|
  |       {"text": ""}            |
  |                               |
  |<------ Close -----------------|

Frame Types

Outbound Frames (Client → Server)

All outbound frames are JSON text messages with the following structure:

1. Initialization Frame

Purpose: Initialize the TTS session Format:

{
  "text": " "
}

Example:

import json

init_frame = {"text": " "}
await websocket.send(json.dumps(init_frame))

Notes:

Must be sent first after connection.
Contains a single space character.
Required to begin the session.

2. Text Frame

Purpose: Send text content to be synthesized into speech Format:

{
  "text": "Your text content here"
}

Example:

text_frame = {"text": "Hello, this is a test of the Telnyx TTS service."}
await websocket.send(json.dumps(text_frame))

Multiple Text Frames:

# You can send multiple text frames sequentially
frames = [
    {"text": "First sentence."},
    {"text": "Second sentence."},
    {"text": "Third sentence."}
]

for frame in frames:
    await websocket.send(json.dumps(frame))
    await asyncio.sleep(0.5)

Notes:

Can send multiple text frames in one session.
Each frame is processed and synthesized separately.
Audio is returned incrementally for each text frame.

3. Stop Frame

Purpose: Signal completion of text input and end the session Format:

{
  "text": ""
}

Example:

stop_frame = {"text": ""}
await websocket.send(json.dumps(stop_frame))

Notes:

Contains an empty string.
Signals the server to finish processing.
Should be sent after all text frames.

Inbound Frames (Server → Client)

The server sends JSON text messages containing synthesized audio data.

Audio Frame

Purpose: Deliver synthesized audio data Format:

{
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA="
}

Processing Audio:

import base64

async for message in websocket:
    data = json.loads(message)

    if "audio" in data:
        # Decode base64 audio
        audio_bytes = base64.b64decode(data["audio"])

        # Save or process audio
        with open("output.mp3", "ab") as f:
            f.write(audio_bytes)

Audio Specifications:

Property	Value
Format	mp3
Sample Rate	16 kHz
Bit Depth	16-bit
Channels	Mono (1)
Encoding	Base64

Notes:

Multiple audio frames may be received for a single text input.
Each audio chunk is a complete mp3 file with headers.
Chunks should be concatenated in the order received.
Use append mode when saving to file to preserve all audio.

Complete Example

Here’s a complete example showing all frame types in sequence:

import asyncio
import json
import base64
import websockets

async def tts_example():
    # 1. Connect to WebSocket
    url = "wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra"
    headers = {
        "Authorization": "Bearer YOUR_TELNYX_TOKEN"
    }

    async with websockets.connect(url, extra_headers=headers) as ws:
        print("Connected to TTS WebSocket")

        # 2. Send initialization frame
        init_frame = {"text": " "}
        await ws.send(json.dumps(init_frame))
        print("Sent: Initialization frame")

        # 3. Send text frame
        text_frame = {"text": "Hello, welcome to Telnyx Text-to-Speech streaming."}
        await ws.send(json.dumps(text_frame))
        print("Sent: Text frame")

        # 4. Receive audio frames
        audio_count = 0
        async for message in ws:
            data = json.loads(message)

            if "audio" in data:
                audio_count += 1
                audio_bytes = base64.b64decode(data["audio"])

                # Append audio chunks to file
                with open("output.mp3", "ab") as f:
                    f.write(audio_bytes)

                print(f"Received: Audio frame #{audio_count} ({len(audio_bytes)} bytes)")

                # After receiving audio, send stop frame
                if audio_count >= 10:  # Adjust based on your needs
                    # 5. Send stop frame
                    stop_frame = {"text": ""}
                    await ws.send(json.dumps(stop_frame))
                    print("Sent: Stop frame")

        print("Connection closed")

asyncio.run(tts_example())

Expected Output:

Connected to TTS WebSocket
Sent: Initialization frame
Sent: Text frame
Received: Audio frame #1 (8192 bytes)
Received: Audio frame #2 (6144 bytes)
Received: Audio frame #3 (4096 bytes)
Sent: Stop frame
Connection closed

Configuration Summary

Required Configuration

# WebSocket URL
ENDPOINT = "wss://api.telnyx.com/v2/text-to-speech/speech"
VOICE_ID = "Telnyx.NaturalHD.astra"
URL = f"{ENDPOINT}?voice={VOICE_ID}"

# Authentication Header
HEADERS = {
    "Authorization": f"Bearer {TELNYX_TOKEN}"
}

Message Sequence

# 1. Initialization
{"text": " "}

# 2. Text to synthesize (can send multiple)
{"text": "Your text here"}

# 3. Stop signal
{"text": ""}

Demo Project

A complete Python implementation is available under the link.

Video Demo

Watch this demonstration to see the Telnyx Text-to-Speech in action:

Troubleshooting

Issue	Solution
Connection fails	Verify token format: `Bearer YOUR_TOKEN`
No audio received	Ensure initialization frame sent first
Audio is garbled	Check base64 decoding and file append mode
Empty audio file	Confirm text frame contains valid content

Voice API

TeXML

SIP Trunking

WebRTC

Text-to-Speech WebSocket Streaming

Overview

WebSocket Endpoint

Connection URL

Query Parameters

Authentication

Example Connection

Connection Flow

Flow Diagram

Frame Types

Outbound Frames (Client → Server)

1. Initialization Frame

2. Text Frame

3. Stop Frame

Inbound Frames (Server → Client)

Audio Frame

Complete Example

Configuration Summary

Required Configuration

Message Sequence

Demo Project

Video Demo

Troubleshooting

Additional Resources

Voice API

TeXML

SIP Trunking

WebRTC

​Overview

​WebSocket Endpoint

​Connection URL

​Query Parameters

​Authentication

​Example Connection

​Connection Flow

​Flow Diagram

​Frame Types

​Outbound Frames (Client → Server)

​1. Initialization Frame

​2. Text Frame

​3. Stop Frame

​Inbound Frames (Server → Client)

​Audio Frame

​Complete Example

​Configuration Summary

​Required Configuration

​Message Sequence

​Demo Project

​Video Demo

​Troubleshooting

​Additional Resources

Overview

WebSocket Endpoint

Connection URL

Query Parameters

Authentication

Example Connection

Connection Flow

Flow Diagram

Frame Types

Outbound Frames (Client → Server)

1. Initialization Frame

2. Text Frame

3. Stop Frame

Inbound Frames (Server → Client)

Audio Frame

Complete Example

Configuration Summary

Required Configuration

Message Sequence

Demo Project

Video Demo

Troubleshooting

Additional Resources