JavaScript

import Telnyx from 'telnyx';

const client = new Telnyx({
  apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});

await client.textToSpeech.stream();

{
  "text": " ",
  "voice_settings": {
    "voice_speed": 1.2
  }
}

Stream text to speech over WebSocket

Open a WebSocket connection to stream text and receive synthesized audio in real time. Authentication is provided via the standard Authorization: Bearer <API_KEY> header. Send JSON frames with text to synthesize; receive JSON frames containing base64-encoded audio chunks.

Supported providers: aws, telnyx, azure, murfai, minimax, rime, resemble, elevenlabs, inworld.

Connection flow:

Open WebSocket with query parameters specifying provider, voice, and model.
Send an initial handshake message {"text": " "} (single space) with optional voice_settings to initialize the session.
Send text messages as {"text": "Hello world"}.
Receive audio chunks as JSON frames with base64-encoded audio.
A final frame with isFinal: true indicates the end of audio for the current text.

To interrupt and restart synthesis mid-stream, send {"force": true} — the current worker is stopped and a new one is started.

GET

text-to-speech

JavaScript

import Telnyx from 'telnyx';

const client = new Telnyx({
  apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});

await client.textToSpeech.stream();

{
  "text": " ",
  "voice_settings": {
    "voice_speed": 1.2
  }
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

voice

string

Voice identifier in the format provider.model_id.voice_id or provider.voice_id (e.g. telnyx.NaturalHD.Telnyx_Alloy or azure.en-US-AvaMultilingualNeural). When provided, the provider, model_id, and voice_id are extracted automatically. Takes precedence over individual provider/model_id/voice_id parameters.

provider

enum<string>

default:telnyx

TTS provider. Defaults to telnyx if not specified. Ignored when voice is provided.

Available options:

aws,

telnyx,

azure,

elevenlabs,

minimax,

murfai,

rime,

resemble,

inworld

model_id

string

Model identifier for the chosen provider. Examples: Natural, NaturalHD (Telnyx); Polly.Generative (AWS).

voice_id

string

Voice identifier for the chosen provider.

disable_cache

boolean

default:false

When true, bypass the audio cache and generate fresh audio.

audio_format

enum<string>

Audio output format override. Supported for Telnyx Natural/NaturalHD models only. Accepted values: pcm, wav.

Available options:

pcm,

wav

socket_id

string

Client-provided socket identifier for tracking. If not provided, one is generated server-side.

Response

WebSocket connection established. Communication proceeds via JSON frames.

Client → Server: See ClientTextFrame schema. Server → Client: See AudioChunkFrame, FinalFrame, and ErrorFrame schemas.

Option 1
Option 2
Option 3
Option 4

Client-to-server frame containing text to synthesize.

text

string

required

Text to convert to speech. Send " " (single space) as an initial handshake with optional voice_settings. Subsequent messages contain the actual text to synthesize.

voice_settings

object

Provider-specific voice settings sent with the initial handshake. Contents vary by provider — e.g. {"speed": 1.2} for Minimax, {"voice_speed": 1.5} for Telnyx.

force

boolean

When true, stops the current synthesis worker and starts a new one. Used to interrupt speech mid-stream and begin synthesizing new text.

Text-to-Speech Available Voices Generate speech from text

⌘I

Text-to-Speech

Speech-to-Text

Voice Designs & Clones

Stream text to speech over WebSocket

Authorizations

Query Parameters

Response