Skip to main content
GET
/
text-to-speech
JavaScript
import Telnyx from 'telnyx';

const client = new Telnyx({
  apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});

await client.textToSpeech.stream();
{
  "text": " ",
  "voice_settings": {
    "voice_speed": 1.2
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

voice
string

Voice identifier in the format provider.model_id.voice_id or provider.voice_id (e.g. telnyx.NaturalHD.Telnyx_Alloy or azure.en-US-AvaMultilingualNeural). When provided, the provider, model_id, and voice_id are extracted automatically. Takes precedence over individual provider/model_id/voice_id parameters.

provider
enum<string>
default:telnyx

TTS provider. Defaults to telnyx if not specified. Ignored when voice is provided.

Available options:
aws,
telnyx,
azure,
elevenlabs,
minimax,
murfai,
rime,
resemble
model_id
string

Model identifier for the chosen provider. Examples: Natural, NaturalHD (Telnyx); Polly.Generative (AWS).

voice_id
string

Voice identifier for the chosen provider.

disable_cache
boolean
default:false

When true, bypass the audio cache and generate fresh audio.

audio_format
enum<string>

Audio output format override. Supported for Telnyx Natural/NaturalHD models only. Accepted values: pcm, wav.

Available options:
pcm,
wav
socket_id
string

Client-provided socket identifier for tracking. If not provided, one is generated server-side.

Response

WebSocket connection established. Communication proceeds via JSON frames.

Client → Server: See ClientTextFrame schema. Server → Client: See AudioChunkFrame, FinalFrame, and ErrorFrame schemas.

Client-to-server frame containing text to synthesize.

text
string
required

Text to convert to speech. Send " " (single space) as an initial handshake with optional voice_settings. Subsequent messages contain the actual text to synthesize.

voice_settings
object

Provider-specific voice settings sent with the initial handshake. Contents vary by provider — e.g. {"speed": 1.2} for Minimax, {"voice_speed": 1.5} for Telnyx.

force
boolean

When true, stops the current synthesis worker and starts a new one. Used to interrupt speech mid-stream and begin synthesizing new text.