Skip to main content
Telnyx provides a comprehensive suite of Text-to-Speech (TTS) and Speech-to-Text (STT) services that can be used across multiple scenarios — from standalone WebSocket streaming to in-call speech features and batch transcription of recordings.

Service overview

TTS WebSocket Streaming

Convert text to natural-sounding audio in real time over WebSocket. Use it in your apps without a phone call.

Available Voices

Browse all TTS voices — Telnyx native voices plus AWS, Azure, ElevenLabs, MiniMax, ResembleAI, Inworld, and Rime.

STT WebSocket Streaming

Stream audio and receive real-time transcription over WebSocket. Supports Telnyx, Google, Deepgram, and Azure engines.

Voice Design Lab

Design custom AI voices from natural language prompts or clone from audio recordings.

API Reference

Full REST and WebSocket API reference for TTS and STT services.

How you can use TTS & STT

Standalone real-time streaming

Use TTS and STT over WebSocket connections independently of any phone call. This is ideal for:
  • Voice bots & assistants — synthesize responses or transcribe user audio in your own application.
  • Content creation — generate voiceovers, narrations, or audio versions of text.
  • Live captioning & subtitles — transcribe audio streams in real time.
  • Accessibility — convert text to audio or audio to text on the fly.

File-based services

TTS and STT are also available as REST APIs for non-streaming use cases:
  • File-based transcription — submit audio files and receive text transcriptions. Ideal for post-call analytics, media processing, and converting audio archives into searchable text.
  • File-based text-to-speech — send text via REST and receive synthesized audio files. Use for generating voiceovers, pre-recorded prompts, or audio content in batch.

Call recording transcription

Telnyx can transcribe call recordings using Google or Deepgram STT engines. There are two approaches:
  • Automatic transcription during recording — set the transcription engine when you start recording a call. The transcript is generated automatically once the recording completes and delivered via webhook. See the Recording Start command to get started.
  • File-based transcription of existing audio — submit a previously recorded audio file (WAV, MP3, FLAC, OGG, and more) to the Speech-to-Text REST API and receive a text transcription. Ideal for post-call analytics or transcribing audio archives.
Both approaches support provider choice, speaker diarization, and multi-language transcription.

In-call TTS & STT

TTS and STT are also available during live phone calls through the Telnyx Voice API:

Supported providers

Text-to-Speech

ProviderDescription
Telnyx Natural (Kokoro)Budget-friendly, great for IVR and high-volume use
Telnyx NaturalHDRefined prosody and disfluency handling
Telnyx UltraPremium expressive voices with sub-100ms TTFB, SSML emotion control, and 36-language support
AWS NeuralAmazon Polly neural voices
Azure Neural / HDMicrosoft Azure neural TTS
ElevenLabsExpressive AI voices
MiniMaxMultilingual, expressive tones
ResembleAIEmotion-preserving AI voices
InworldExpressive multilingual AI voices with Mini and Max models
RimeMultilingual AI voices with native codeswitching

Speech-to-Text

EngineDescription
TelnyxIn-house engine — high accuracy, low latency
GoogleGoogle STT with interim results support
DeepgramNova-2, Nova-3, and Flux models
AzureStrong multilingual and accent support

Get started

1

Get your API key

Create an API key in the Telnyx Mission Control Portal.
2

Choose your approach

3

Connect and stream

Open a WebSocket connection, authenticate with your API key, and start streaming — or submit files via REST.