Telnyx

Comparison

Engine	Model (WebSocket)	Model (REST)	Latency	Languages	Best for
Deepgram	`nova-3`	`deepgram/nova-3`	Low	40+ (reference)	Recommended. Highest English accuracy, diarization, word timestamps
Deepgram	`nova-2`	`deepgram/nova-2`	Low	40+	Legacy — use nova-3 unless you have a specific reason
Deepgram	`flux`	—	Lowest	10 languages	Voice agents — built-in end-of-turn detection (WebSocket only)
Telnyx	`openai/whisper-large-v3-turbo`	`openai/whisper-large-v3-turbo`	Medium	50+ (reference)	Multilingual transcription
Telnyx	`openai/whisper-tiny`	`openai/whisper-tiny`	Low	50+	Lightweight, on-network
Google	`latest_long`	—	Medium	125+ (reference)	Long-form multilingual audio (WebSocket only)
Azure	`azure/fast`	—	Medium	100+ (reference)	Broad language and accent coverage (WebSocket only)
xAI	`xai/grok-stt`	—	Low	25 languages	Grok STT for real-time transcription (WebSocket and Voice API only)
AssemblyAI	`assemblyai/universal-streaming`	—	Low	6 languages	Universal-Streaming for voice agents with low latency and turn detection (WebSocket and Voice API only)
Speechmatics	`speechmatics/standard`	—	Low	17+ languages	High-accuracy real-time transcription with bilingual and multilingual packs (WebSocket and Voice API only)
Soniox	`soniox/stt-rt-v4`	—	Low	Auto-detect	Real-time transcription with interim results and endpointing (WebSocket and Voice API only)

Engine Details

The default WebSocket engine. Best English accuracy and the richest feature set. For REST, you must explicitly set model="deepgram/nova-3" — the REST default is openai/whisper-large-v3-turbo.Models:

nova-3 — Latest and most accurate. Supports diarization, word-level timestamps, smart formatting, numerals, and punctuation via model_config. Use this unless you need the lowest possible latency.
nova-2 — Previous generation. Still supported but nova-3 is better in all benchmarks.
flux — Purpose-built for voice agents. Lowest latency with built-in end-of-turn detection — tells you when the speaker has finished so your agent can respond. WebSocket only.

Languages: 40+ languages across Deepgram models. Nova-3 supports multi mode (10 languages with code-switching). Flux supports English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. See Deepgram languages.

Telnyx runs Whisper models on-network.Models:

openai/whisper-large-v3-turbo — Multilingual (50+ languages, auto-detected). Returns text only — no timestamps regardless of response format.
openai/whisper-tiny — Lightweight, lowest resource usage.

Languages: 50+ languages, auto-detected. Use auto_detect to skip the language hint. See the Whisper language list.Limitations: No diarization. No word-level timestamps.

Google Cloud Speech-to-Text integration.Model: latest_longLanguages: 125+ languages/locales. See Google Cloud STT languages.

Microsoft Azure Speech Services integration.Model: azure/fastLanguages: 100+ languages/locales with strong accent and dialect coverage. See Azure Speech languages.

xAI Grok STT integration for real-time transcription.Model: xai/grok-sttLanguages: 25 languages, including Arabic, English, French, German, Hindi, Japanese, Korean, Portuguese, Spanish, and Vietnamese.

AssemblyAI Universal-Streaming integration for real-time voice agent transcription.Model: assemblyai/universal-streamingLanguages: English, Spanish, German, French, Portuguese, and Italian.

Speechmatics real-time transcription integration with high accuracy and multilingual support including bilingual packs.Model: speechmatics/standardLanguages: English, Spanish, plus bilingual/multilingual packs including Arabic–English, Mandarin–English, English–Malay, English–Tamil, Tagalog, and Spanish–English bilingual. Also supports Basque, Galician, Irish, Maltese, Mongolian, Swahili, Uyghur, and Welsh.Features: Supports interim results (partial transcripts) and graceful CloseStream shutdown.

Soniox real-time transcription integration with automatic language detection.Model: soniox/stt-rt-v4Languages: Automatic detection — no language hint required.Features: Supports interim results (partial transcripts), endpointing, and graceful CloseStream shutdown.

How to Choose

Need the highest accuracy for English? → Deepgram nova-3 — best WER (word error rate) across all English variants. Building a voice agent that needs to know when the user stopped talking? → Deepgram flux — lowest latency with built-in end-of-turn detection. Need to transcribe files in 50+ languages? → Telnyx openai/whisper-large-v3-turbo via REST API. Need diarization (who said what)? → Deepgram nova-3 with model_config.diarize: true. Need broad accent/dialect support? → Azure azure/fast — strong coverage across regional accents. Need Grok STT for real-time calls? → xAI xai/grok-stt via WebSocket or Voice API. Need low-latency streaming for voice agents? → AssemblyAI assemblyai/universal-streaming via WebSocket or Voice API. Need high-accuracy multilingual with bilingual packs? → Speechmatics speechmatics/standard via WebSocket or Voice API. Need real-time transcription with automatic language detection? → Soniox soniox/stt-rt-v4 via WebSocket or Voice API.

Specifying the Engine and Model

WebSocket — set via query parameters:

wss://api.telnyx.com/v2/speech-to-text/transcription?transcription_engine=Deepgram&model=nova-3

REST API — set via the model body parameter:

curl -X POST https://api.telnyx.com/v2/ai/audio/transcriptions \
  -H "Authorization: Bearer YOUR_TELNYX_API_KEY" \
  -F model="deepgram/nova-3" \
  -F file=@audio.mp3

​Comparison

​Engine Details

​How to Choose

​Specifying the Engine and Model

Comparison

Engine Details

How to Choose

Specifying the Engine and Model