Skip to main content
Your choice of model determines which audio formats are accepted, what language values are valid, and what response fields are available.
openai/whisper-large-v3-turboopenai/whisper-tinydeepgram/nova-3
DefaultYes
Audio formatsAll 10All 10mp3, wav only
Language80+ languages, auto-detected50+ languages, auto-detectedEnglish variants only (en, en-US, en-GB, en-AU, en-NZ, en-IN)
TimestampsNoNoWord-level (via model_config)
DiarizationNoNoYes (via model_config)
Smart formattingNoNoYes (via model_config)
model_configReturns 400Returns 400Deepgram pass-through

openai/whisper-large-v3-turbo

Default model. Multilingual. Auto-detected if language omitted. See Whisper docs for the full language list. Returns text only — no timestamps regardless of response_format.

openai/whisper-tiny

Lightweight, lowest resource usage. Multilingual (50+ languages, auto-detected). Returns text only — no timestamps.

deepgram/nova-3

Highest accuracy for English. Advanced features (diarization, word timestamps, smart formatting, numerals, punctuation) available via model_config. Defaults language to en if omitted. Can also set language inside model_config — top-level field takes precedence. See Deepgram language docs for details.