Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt

Use this file to discover all available pages before exploring further.

Comparison

EngineModel (WebSocket)Model (REST)LatencyLanguagesBest for
Deepgramnova-3deepgram/nova-3Low40+ (reference)Recommended. Highest English accuracy, diarization, word timestamps
Deepgramnova-2deepgram/nova-2Low40+Legacy — use nova-3 unless you have a specific reason
DeepgramfluxLowestEnglish onlyVoice agents — built-in end-of-turn detection (WebSocket only)
Telnyxopenai/whisper-large-v3-turboopenai/whisper-large-v3-turboMedium50+ (reference)Multilingual transcription
Telnyxopenai/whisper-tinyopenai/whisper-tinyLow50+Lightweight, on-network
Googlelatest_longMedium125+ (reference)Long-form multilingual audio (WebSocket only)
Azureazure/fastMedium100+ (reference)Broad language and accent coverage (WebSocket only)
xAIxai/grok-sttLow25 languagesGrok STT for real-time transcription (WebSocket and Voice API only)
AssemblyAIassemblyai/universal-streamingLow6 languagesUniversal-Streaming for voice agents with low latency and turn detection (WebSocket and Voice API only)
Speechmaticsspeechmatics/standardLow17+ languagesHigh-accuracy real-time transcription with bilingual and multilingual packs (WebSocket and Voice API only)

Engine Details

The default WebSocket engine. Best English accuracy and the richest feature set. For REST, you must explicitly set model="deepgram/nova-3" — the REST default is openai/whisper-large-v3-turbo.Models:
  • nova-3 — Latest and most accurate. Supports diarization, word-level timestamps, smart formatting, numerals, and punctuation via model_config. Use this unless you need the lowest possible latency.
  • nova-2 — Previous generation. Still supported but nova-3 is better in all benchmarks.
  • flux — Purpose-built for voice agents. Lowest latency with built-in end-of-turn detection — tells you when the speaker has finished so your agent can respond. WebSocket only.
Languages: 40+ languages. Nova-3 supports multi mode (10 languages with code-switching). Flux is English only. See Deepgram languages.

How to Choose

Need the highest accuracy for English? → Deepgram nova-3 — best WER (word error rate) across all English variants. Building a voice agent that needs to know when the user stopped talking? → Deepgram flux — lowest latency with built-in end-of-turn detection. Need to transcribe files in 50+ languages? → Telnyx openai/whisper-large-v3-turbo via REST API. Need diarization (who said what)? → Deepgram nova-3 with model_config.diarize: true. Need broad accent/dialect support? → Azure azure/fast — strong coverage across regional accents. Need Grok STT for real-time calls? → xAI xai/grok-stt via WebSocket or Voice API. Need low-latency streaming for voice agents? → AssemblyAI assemblyai/universal-streaming via WebSocket or Voice API. Need high-accuracy multilingual with bilingual packs? → Speechmatics speechmatics/standard via WebSocket or Voice API.

Specifying the Engine and Model

WebSocket — set via query parameters:
wss://api.telnyx.com/v2/speech-to-text/transcription?transcription_engine=Deepgram&model=nova-3
REST API — set via the model body parameter:
curl -X POST https://api.telnyx.com/v2/ai/audio/transcriptions \
  -H "Authorization: Bearer YOUR_TELNYX_API_KEY" \
  -F model="deepgram/nova-3" \
  -F file=@audio.mp3