Skip to main content
Controlled by the response_format parameter.

json (Default)

Text only.
{
  "text": "The quick brown fox jumps over the lazy dog."
}

verbose_json

Adds duration (seconds) and timestamped segmentsonly when using deepgram/nova-3. The Whisper models (openai/whisper-large-v3-turbo, openai/whisper-tiny) return text only regardless of response_format. See the Timestamp Availability table below. Example response with model=deepgram/nova-3:
{
  "text": "The quick brown fox jumps over the lazy dog.",
  "duration": 3.42,
  "segments": [
    {
      "id": 0,
      "text": "The quick brown fox jumps over the lazy dog.",
      "start": 0.0,
      "end": 3.42
    }
  ]
}
Set timestamp_granularities[]=segment alongside response_format=verbose_json. Using timestamp_granularities without verbose_json returns 400.

Segment Fields

FieldTypeDescription
idintegerZero-indexed segment number
textstringSegment transcript
startfloatStart time in seconds
endfloatEnd time in seconds
wordsarrayWord-level timestamps (present when the backend provides them — Deepgram only)
speakersarraySpeaker labels (present when diarize=true in model_config — Deepgram only)

Timestamp Availability by Model

Modelverbose_json timestamps
openai/whisper-large-v3-turboNo timestamps — backend returns text only
openai/whisper-tinyNo timestamps — backend returns text only
deepgram/nova-3Segment-level + word-level (from Deepgram response)

Streaming Response (Undocumented)

Sending Accept: application/stream+json returns newline-delimited JSON chunks as segments are transcribed. Each line:
{"text": "segment text", "start": 0.0, "end": 3.42}
This is used internally but not in the public OAS spec.