TTS & STT Services

Telnyx provides a comprehensive suite of Text-to-Speech (TTS) and Speech-to-Text (STT) services that can be used across multiple scenarios — from standalone WebSocket streaming to in-call speech features and batch transcription of recordings.

Service overview

TTS WebSocket Streaming

Convert text to natural-sounding audio in real time over WebSocket. Use it in your apps without a phone call.

Available Voices

Browse all TTS voices — Telnyx native voices plus AWS, Azure, ElevenLabs, MiniMax, ResembleAI, Inworld, and Rime.

STT WebSocket Streaming

Stream audio and receive real-time transcription over WebSocket. Supports Telnyx, Google, Deepgram, and Azure engines.

Voice Design Lab

Design custom AI voices from natural language prompts or clone from audio recordings.

API Reference

Full REST and WebSocket API reference for TTS and STT services.

How you can use TTS & STT

Standalone real-time streaming

Use TTS and STT over WebSocket connections independently of any phone call. This is ideal for:

Voice bots & assistants — synthesize responses or transcribe user audio in your own application.
Content creation — generate voiceovers, narrations, or audio versions of text.
Live captioning & subtitles — transcribe audio streams in real time.
Accessibility — convert text to audio or audio to text on the fly.

File-based services

TTS and STT are also available as REST APIs for non-streaming use cases:

File-based transcription — submit audio files and receive text transcriptions. Ideal for post-call analytics, media processing, and converting audio archives into searchable text.
File-based text-to-speech — send text via REST and receive synthesized audio files. Use for generating voiceovers, pre-recorded prompts, or audio content in batch.

Call recording transcription

Telnyx can transcribe call recordings using Google or Deepgram STT engines. There are two approaches:

Automatic transcription during recording — set the transcription engine when you start recording a call. The transcript is generated automatically once the recording completes and delivered via webhook. See the Recording Start command to get started.
File-based transcription of existing audio — submit a previously recorded audio file (WAV, MP3, FLAC, OGG, and more) to the Speech-to-Text REST API and receive a text transcription. Ideal for post-call analytics or transcribing audio archives.

Both approaches support provider choice, speaker diarization, and multi-language transcription.

In-call TTS & STT

TTS and STT are also available during live phone calls through the Telnyx Voice API:

In-call TTS — play synthesized speech to callers using the speak command (Voice API TTS guide).
In-call STT — transcribe caller speech in real time during a call (Voice API Speech-to-Text guide).
Gather with AI — use STT to capture caller input with natural language understanding (Gather using AI guide).

Supported providers

Text-to-Speech

Provider	Description
Telnyx Natural (Kokoro)	Budget-friendly, great for IVR and high-volume use
Telnyx NaturalHD	Refined prosody and disfluency handling
Telnyx Ultra	Premium expressive voices with sub-100ms TTFB, SSML emotion control, and 36-language support
xAI Grok	Expressive voices with xAI speech tags for pauses, laughter, whispers, emphasis, pitch, pace, and intensity
AWS Neural	Amazon Polly neural voices
Azure Neural / HD	Microsoft Azure neural TTS
ElevenLabs	Expressive AI voices
MiniMax	Multilingual, expressive tones
ResembleAI	Emotion-preserving AI voices
Inworld	Expressive multilingual AI voices with Mini and Max models
Rime	Multilingual AI voices with native codeswitching

Speech-to-Text

Engine	Description
Telnyx	In-house engine — high accuracy, low latency
Google	Google STT with interim results support
Deepgram	Nova-2, Nova-3, and Flux models
Azure	Strong multilingual and accent support

Get started

Get your API key

Create an API key in the Telnyx Mission Control Portal.

Choose your approach

Real-time streaming → TTS WebSocket or STT WebSocket.
File transcription → Use the STT REST API.
In-call speech → See the Voice API TTS and STT guides.

Connect and stream

Open a WebSocket connection, authenticate with your API key, and start streaming — or submit files via REST.

​Service overview

TTS WebSocket Streaming

Available Voices

STT WebSocket Streaming

Voice Design Lab

API Reference

​How you can use TTS & STT

​Standalone real-time streaming

​File-based services

​Call recording transcription

​In-call TTS & STT

​Supported providers

​Text-to-Speech

​Speech-to-Text

​Get started