Documentation Index
Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
Use this file to discover all available pages before exploring further.
Comparison
| Engine | Model (WebSocket) | Model (REST) | Latency | Languages | Best for |
|---|---|---|---|---|---|
| Deepgram | nova-3 | deepgram/nova-3 | Low | 40+ (reference) | Recommended. Highest English accuracy, diarization, word timestamps |
| Deepgram | nova-2 | deepgram/nova-2 | Low | 40+ | Legacy — use nova-3 unless you have a specific reason |
| Deepgram | flux | — | Lowest | English only | Voice agents — built-in end-of-turn detection (WebSocket only) |
| Telnyx | openai/whisper-large-v3-turbo | openai/whisper-large-v3-turbo | Medium | 50+ (reference) | Multilingual transcription |
| Telnyx | openai/whisper-tiny | openai/whisper-tiny | Low | 50+ | Lightweight, on-network |
latest_long | — | Medium | 125+ (reference) | Long-form multilingual audio (WebSocket only) | |
| Azure | azure/fast | — | Medium | 100+ (reference) | Broad language and accent coverage (WebSocket only) |
| xAI | xai/grok-stt | — | Low | 25 languages | Grok STT for real-time transcription (WebSocket and Voice API only) |
| AssemblyAI | assemblyai/universal-streaming | — | Low | 6 languages | Universal-Streaming for voice agents with low latency and turn detection (WebSocket and Voice API only) |
| Speechmatics | speechmatics/standard | — | Low | 17+ languages | High-accuracy real-time transcription with bilingual and multilingual packs (WebSocket and Voice API only) |
Engine Details
- Deepgram
- Telnyx
- Google
- Azure
- xAI
- AssemblyAI
- Speechmatics
The default WebSocket engine. Best English accuracy and the richest feature set. For REST, you must explicitly set
model="deepgram/nova-3" — the REST default is openai/whisper-large-v3-turbo.Models:nova-3— Latest and most accurate. Supports diarization, word-level timestamps, smart formatting, numerals, and punctuation viamodel_config. Use this unless you need the lowest possible latency.nova-2— Previous generation. Still supported but nova-3 is better in all benchmarks.flux— Purpose-built for voice agents. Lowest latency with built-in end-of-turn detection — tells you when the speaker has finished so your agent can respond. WebSocket only.
multi mode (10 languages with code-switching). Flux is English only. See Deepgram languages.How to Choose
Need the highest accuracy for English? → Deepgramnova-3 — best WER (word error rate) across all English variants.
Building a voice agent that needs to know when the user stopped talking?
→ Deepgram flux — lowest latency with built-in end-of-turn detection.
Need to transcribe files in 50+ languages?
→ Telnyx openai/whisper-large-v3-turbo via REST API.
Need diarization (who said what)?
→ Deepgram nova-3 with model_config.diarize: true.
Need broad accent/dialect support?
→ Azure azure/fast — strong coverage across regional accents.
Need Grok STT for real-time calls?
→ xAI xai/grok-stt via WebSocket or Voice API.
Need low-latency streaming for voice agents?
→ AssemblyAI assemblyai/universal-streaming via WebSocket or Voice API.
Need high-accuracy multilingual with bilingual packs?
→ Speechmatics speechmatics/standard via WebSocket or Voice API.
Specifying the Engine and Model
WebSocket — set via query parameters:model body parameter: