Skip to main content
Controlled by the response_format parameter.

json (Default)

Text only.
{
  "text": "The quick brown fox jumps over the lazy dog."
}

verbose_json

Adds duration (seconds) and timestamped segments.
{
  "text": "The quick brown fox jumps over the lazy dog.",
  "duration": 3.42,
  "segments": [
    {
      "id": 0,
      "text": "The quick brown fox jumps over the lazy dog.",
      "start": 0.0,
      "end": 3.42
    }
  ]
}
Set timestamp_granularities[]=segment alongside response_format=verbose_json. Using timestamp_granularities without verbose_json returns 400.

Segment Fields

FieldTypeDescription
idintegerZero-indexed segment number
textstringSegment transcript
startfloatStart time in seconds
endfloatEnd time in seconds
wordsarrayWord-level timestamps (present when the backend provides them — Deepgram only)
speakersarraySpeaker labels (present when diarize=true in model_config — Deepgram only)

Timestamp Availability by Model

Modelverbose_json timestamps
distil-whisper/distil-large-v2Segment-level start/end (from Bumblebee streaming)
openai/whisper-large-v3-turboNo timestamps — backend returns text only
deepgram/nova-3Segment-level + word-level (from Deepgram response)

Streaming Response (Undocumented)

Sending Accept: application/stream+json returns newline-delimited JSON chunks as segments are transcribed. Each line:
{"text": "segment text", "start": 0.0, "end": 3.42}
This is used internally but not in the public OAS spec.