Service overview
TTS WebSocket Streaming
Convert text to natural-sounding audio in real time over WebSocket. Use it in your apps without a phone call.
Available Voices
Browse all TTS voices — Telnyx native voices plus AWS, Azure, ElevenLabs, MiniMax, and ResembleAI.
STT WebSocket Streaming
Stream audio and receive real-time transcription over WebSocket. Supports Telnyx, Google, Deepgram, and Azure engines.
API Reference
Full REST and WebSocket API reference for TTS and STT services.
How you can use TTS & STT
Standalone real-time streaming
Use TTS and STT over WebSocket connections independently of any phone call. This is ideal for:- Voice bots & assistants — synthesize responses or transcribe user audio in your own application
- Content creation — generate voiceovers, narrations, or audio versions of text
- Live captioning & subtitles — transcribe audio streams in real time
- Accessibility — convert text to audio or audio to text on the fly
File-based services
TTS and STT are also available as REST APIs for non-streaming use cases:- File-based transcription — submit audio files and receive text transcriptions. Ideal for post-call analytics, media processing, and converting audio archives into searchable text
- File-based text-to-speech — send text via REST and receive synthesized audio files. Use for generating voiceovers, pre-recorded prompts, or audio content in batch
Call recording transcription
Telnyx can automatically transcribe call recordings generated through the Voice API. Combine call recording with STT to get transcripts delivered alongside your recordings — no extra integration needed. See the Recording Start command to get started.In-call TTS & STT
TTS and STT are also available during live phone calls through the Telnyx Voice API:- In-call TTS — play synthesized speech to callers using the
speakcommand (Voice API TTS guide) - In-call STT — transcribe caller speech in real time during a call (Voice API Speech-to-Text guide)
- Gather with AI — use STT to capture caller input with natural language understanding (Gather using AI guide)
Supported providers
Text-to-Speech
| Provider | Description |
|---|---|
| Telnyx Natural (Kokoro) | Budget-friendly, great for IVR and high-volume use |
| Telnyx NaturalHD | Refined prosody and disfluency handling |
| AWS Neural | Amazon Polly neural voices |
| Azure Neural / HD | Microsoft Azure neural TTS |
| ElevenLabs | Expressive AI voices |
| MiniMax | Multilingual, expressive tones |
| ResembleAI | Emotion-preserving AI voices |
Speech-to-Text
| Engine | Description |
|---|---|
| Telnyx | In-house engine — high accuracy, low latency |
| Google STT with interim results support | |
| Deepgram | Nova-2, Nova-3, and Flux models |
| Azure | Strong multilingual and accent support |
Get started
Get your API key
Create an API key in the Telnyx Mission Control Portal.
Choose your approach
- Real-time streaming → TTS WebSocket or STT WebSocket
- File transcription → Use the STT REST API
- In-call speech → See the Voice API TTS and STT guides