Skip to main content
Your choice of model determines which audio formats are accepted, what language values are valid, and what response fields are available.
distil-whisper/distil-large-v2openai/whisper-large-v3-turbodeepgram/nova-3
DefaultYes
Audio formatsAll 10All 10mp3, wav only
LanguageEnglish only (rejects language with 400)80+ languages, auto-detectedEnglish variants only (en, en-US, en-GB, en-AU, en-NZ, en-IN)
TimestampsSegment-level (verbose_json)NoWord-level (via model_config)
DiarizationNoNoYes (via model_config)
Smart formattingNoNoYes (via model_config)
model_configReturns 400Returns 400Deepgram pass-through

distil-whisper/distil-large-v2

Lowest latency. Runs on-device. English only — setting language returns 400.

openai/whisper-large-v3-turbo

Multilingual. Auto-detected if language omitted. See Whisper docs for the full language list. Returns text only — no timestamps regardless of response_format.

deepgram/nova-3

Highest accuracy for English. Advanced features (diarization, word timestamps, smart formatting, numerals, punctuation) available via model_config. Defaults language to en if omitted. Can also set language inside model_config — top-level field takes precedence. See Deepgram language docs for details.