response_format parameter.
json (Default)
Text only.
verbose_json
Adds duration (seconds) and timestamped segments.
timestamp_granularities[]=segment alongside response_format=verbose_json. Using timestamp_granularities without verbose_json returns 400.
Segment Fields
| Field | Type | Description |
|---|---|---|
id | integer | Zero-indexed segment number |
text | string | Segment transcript |
start | float | Start time in seconds |
end | float | End time in seconds |
words | array | Word-level timestamps (present when the backend provides them — Deepgram only) |
speakers | array | Speaker labels (present when diarize=true in model_config — Deepgram only) |
Timestamp Availability by Model
| Model | verbose_json timestamps |
|---|---|
distil-whisper/distil-large-v2 | Segment-level start/end (from Bumblebee streaming) |
openai/whisper-large-v3-turbo | No timestamps — backend returns text only |
deepgram/nova-3 | Segment-level + word-level (from Deepgram response) |
Streaming Response (Undocumented)
SendingAccept: application/stream+json returns newline-delimited JSON chunks as segments are transcribed. Each line: