Endpoint
Authentication
Pass your API key via theAuthorization header during the WebSocket upgrade:
Connection Flow
Handshake
There are two ways to establish a connection:Direct WebSocket connection
You can connect directly to the WebSocket endpoint by passing all configuration as query parameters in thewss:// URL:
HTTP upgrade
Alternatively, initiate the connection as an HTTP GET request that upgrades to a WebSocket via the standard101 Switching Protocols handshake. This is what happens under the hood when a WebSocket client connects, and may be relevant if you need fine-grained control over the upgrade (e.g., setting custom headers in environments where the WebSocket library doesn’t expose them directly).
Initialization frame
Regardless of how the connection is established, send an initialization frame before any text:voice_settings to configure provider-specific parameters:
Text Buffering
Text frames are not synthesized immediately. The server accumulates text and synthesizes when it detects a complete sentence boundary (period, question mark, exclamation, etc.). This means:- Short fragments without punctuation will buffer until more text arrives
- A sentence like
"Hello."synthesizes immediately (complete sentence) - A fragment like
"Hello, my name"waits for the sentence to complete
Interruption
To stop current synthesis and start fresh (barge-in):Text Preprocessing
Before synthesis, text is automatically preprocessed:- Markdown stripping — headers, bold, italics, code blocks, inline code, links, lists, blockquotes, horizontal rules, and emoji are stripped to plain text
- Pronunciation dictionary — if
pronunciation_dict_idis set, custom word replacements are applied (SSML-safe)
Teardown
Send an empty text frame to signal end of input:- Client WebSocket close
- Server error (see Errors)
- Inactivity timeout