response_format parameter.
json (Default)
Text only.
verbose_json
Adds duration (seconds) and timestamped segments — only when using deepgram/nova-3. The Whisper models (openai/whisper-large-v3-turbo, openai/whisper-tiny) return text only regardless of response_format. See the Timestamp Availability table below.
Example response with model=deepgram/nova-3:
timestamp_granularities[]=segment alongside response_format=verbose_json. Using timestamp_granularities without verbose_json returns 400.
Segment Fields
| Field | Type | Description |
|---|---|---|
id | integer | Zero-indexed segment number |
text | string | Segment transcript |
start | float | Start time in seconds |
end | float | End time in seconds |
words | array | Word-level timestamps (present when the backend provides them — Deepgram only) |
speakers | array | Speaker labels (present when diarize=true in model_config — Deepgram only) |
Timestamp Availability by Model
| Model | verbose_json timestamps |
|---|---|
openai/whisper-large-v3-turbo | No timestamps — backend returns text only |
openai/whisper-tiny | No timestamps — backend returns text only |
deepgram/nova-3 | Segment-level + word-level (from Deepgram response) |
Streaming Response (Undocumented)
SendingAccept: application/stream+json returns newline-delimited JSON chunks as segments are transcribed. Each line: