Service overview
TTS WebSocket Streaming
Convert text to natural-sounding audio in real time over WebSocket. Use it in your apps without a phone call.
Available Voices
Browse all TTS voices — Telnyx native voices plus AWS, Azure, ElevenLabs, MiniMax, ResembleAI, Inworld, and Rime.
STT WebSocket Streaming
Stream audio and receive real-time transcription over WebSocket. Supports Telnyx, Google, Deepgram, and Azure engines.
Voice Design Lab
Design custom AI voices from natural language prompts or clone from audio recordings.
API Reference
Full REST and WebSocket API reference for TTS and STT services.
How you can use TTS & STT
Standalone real-time streaming
Use TTS and STT over WebSocket connections independently of any phone call. This is ideal for:- Voice bots & assistants — synthesize responses or transcribe user audio in your own application.
- Content creation — generate voiceovers, narrations, or audio versions of text.
- Live captioning & subtitles — transcribe audio streams in real time.
- Accessibility — convert text to audio or audio to text on the fly.
File-based services
TTS and STT are also available as REST APIs for non-streaming use cases:- File-based transcription — submit audio files and receive text transcriptions. Ideal for post-call analytics, media processing, and converting audio archives into searchable text.
- File-based text-to-speech — send text via REST and receive synthesized audio files. Use for generating voiceovers, pre-recorded prompts, or audio content in batch.
Call recording transcription
Telnyx can transcribe call recordings using Google or Deepgram STT engines. There are two approaches:- Automatic transcription during recording — set the transcription engine when you start recording a call. The transcript is generated automatically once the recording completes and delivered via webhook. See the Recording Start command to get started.
- File-based transcription of existing audio — submit a previously recorded audio file (WAV, MP3, FLAC, OGG, and more) to the Speech-to-Text REST API and receive a text transcription. Ideal for post-call analytics or transcribing audio archives.
In-call TTS & STT
TTS and STT are also available during live phone calls through the Telnyx Voice API:- In-call TTS — play synthesized speech to callers using the
speakcommand (Voice API TTS guide). - In-call STT — transcribe caller speech in real time during a call (Voice API Speech-to-Text guide).
- Gather with AI — use STT to capture caller input with natural language understanding (Gather using AI guide).
Supported providers
Text-to-Speech
| Provider | Description |
|---|---|
| Telnyx Natural (Kokoro) | Budget-friendly, great for IVR and high-volume use |
| Telnyx NaturalHD | Refined prosody and disfluency handling |
| Telnyx Ultra | Premium expressive voices with sub-100ms TTFB, SSML emotion control, and 36-language support |
| AWS Neural | Amazon Polly neural voices |
| Azure Neural / HD | Microsoft Azure neural TTS |
| ElevenLabs | Expressive AI voices |
| MiniMax | Multilingual, expressive tones |
| ResembleAI | Emotion-preserving AI voices |
| Inworld | Expressive multilingual AI voices with Mini and Max models |
| Rime | Multilingual AI voices with native codeswitching |
Speech-to-Text
| Engine | Description |
|---|---|
| Telnyx | In-house engine — high accuracy, low latency |
| Google STT with interim results support | |
| Deepgram | Nova-2, Nova-3, and Flux models |
| Azure | Strong multilingual and accent support |
Get started
Get your API key
Create an API key in the Telnyx Mission Control Portal.
Choose your approach
- Real-time streaming → TTS WebSocket or STT WebSocket.
- File transcription → Use the STT REST API.
- In-call speech → See the Voice API TTS and STT guides.