Speech-to-Text REST API models

Your choice of model determines which audio formats are accepted, what language values are valid, and what response fields are available.

	`openai/whisper-large-v3-turbo`	`openai/whisper-tiny`	`deepgram/nova-3`
Default	Yes
Audio formats	All 10	All 10	mp3, wav only
Language	80+ languages, auto-detected	50+ languages, auto-detected	English variants only (`en`, `en-US`, `en-GB`, `en-AU`, `en-NZ`, `en-IN`)
Timestamps	No	No	Word-level (via `model_config`)
Diarization	No	No	Yes (via `model_config`)
Smart formatting	No	No	Yes (via `model_config`)
`model_config`	Returns 400	Returns 400	Deepgram pass-through

`openai/whisper-large-v3-turbo`

Default model. Multilingual. Auto-detected if language omitted. See Whisper docs for the full language list. Returns text only — no timestamps regardless of response_format.

`openai/whisper-tiny`

Lightweight, lowest resource usage. Multilingual (50+ languages, auto-detected). Returns text only — no timestamps.

`deepgram/nova-3`

Highest accuracy for English. Advanced features (diarization, word timestamps, smart formatting, numerals, punctuation) available via model_config. Defaults language to en if omitted. Can also set language inside model_config — top-level field takes precedence. See Deepgram language docs for details.

​openai/whisper-large-v3-turbo

​openai/whisper-tiny

​deepgram/nova-3

`openai/whisper-large-v3-turbo`

`openai/whisper-tiny`

`deepgram/nova-3`