Skip to main content

Endpoint

wss://api.telnyx.com/v2/text-to-speech/speech
Query parameters select voice and configuration. See Parameters.

Authentication

Pass your API key via the Authorization header during the WebSocket upgrade:
Authorization: Bearer YOUR_API_KEY

Connection Flow

Client                                          Server
  |                                               |
  |--- GET /v2/text-to-speech/speech?voice=... -->|
  |<-- 101 Switching Protocols -------------------|
  |                                               |
  |--- {"text": " "} (handshake) ---------------->|
  |                                               |
  |--- {"text": "Hello, welcome."} -------------->|
  |                                  (buffering)  |
  |--- {"text": " How are you?"} ---------------->|
  |                            (sentence ready)   |
  |<-- {"audio":"<b64>","isFinal":false} ---------|  (streamed chunks)
  |<-- {"audio":"<b64>","isFinal":false} ---------|
  |<-- {"audio":null,"text":"...","isFinal":false}|  (concatenated — may be null)
  |<-- {"audio":null,"text":"","isFinal":true} ---|  (synthesis complete)
  |                                               |
  |--- {"text": ""} (end of sequence) ----------->|
  |<-- remaining audio + final frame -------------|
  |                          connection closes    |

Handshake

There are two ways to establish a connection:

Direct WebSocket connection

You can connect directly to the WebSocket endpoint by passing all configuration as query parameters in the wss:// URL:
wss://api.telnyx.com/v2/text-to-speech/speech?voice=Telnyx.NaturalHD.astra
Most WebSocket clients and libraries support this natively — simply open a WebSocket connection to the URL and begin the message flow. No separate HTTP request is needed.

HTTP upgrade

Alternatively, initiate the connection as an HTTP GET request that upgrades to a WebSocket via the standard 101 Switching Protocols handshake. This is what happens under the hood when a WebSocket client connects, and may be relevant if you need fine-grained control over the upgrade (e.g., setting custom headers in environments where the WebSocket library doesn’t expose them directly).

Initialization frame

Regardless of how the connection is established, send an initialization frame before any text:
{"text": " "}
The initialization frame may include voice_settings to configure provider-specific parameters:
{
  "text": " ",
  "voice_settings": {
    "voice_speed": 1.2
  }
}

Text Buffering

Text frames are not synthesized immediately. The server accumulates text and synthesizes when it detects a complete sentence boundary (period, question mark, exclamation, etc.). This means:
  • Short fragments without punctuation will buffer until more text arrives
  • A sentence like "Hello." synthesizes immediately (complete sentence)
  • A fragment like "Hello, my name" waits for the sentence to complete
To force immediate synthesis of buffered text regardless of sentence boundaries:
{"text": "partial sentence", "flush": true}

Interruption

To stop current synthesis and start fresh (barge-in):
{"force": true}
This kills the active synthesis worker, starts a new one, and replays the original handshake. Any text sent after this frame goes to the new worker.

Text Preprocessing

Before synthesis, text is automatically preprocessed:
  • Markdown stripping — headers, bold, italics, code blocks, inline code, links, lists, blockquotes, horizontal rules, and emoji are stripped to plain text
  • Pronunciation dictionary — if pronunciation_dict_id is set, custom word replacements are applied (SSML-safe)
This is useful when synthesizing LLM output that contains markdown formatting.

Teardown

Send an empty text frame to signal end of input:
{"text": ""}
The server synthesizes any remaining buffered text, sends a final frame, and closes the connection. The connection also closes on:
  • Client WebSocket close
  • Server error (see Errors)
  • Inactivity timeout