Response Format

Controlled by the response_format parameter.

`json` (Default)

Text only.

{
  "text": "The quick brown fox jumps over the lazy dog."
}

`verbose_json`

Adds duration (seconds) and timestamped segments — only when using deepgram/nova-3. The Whisper models (openai/whisper-large-v3-turbo, openai/whisper-tiny) return text only regardless of response_format. See the Timestamp Availability table below. Example response with model=deepgram/nova-3:

{
  "text": "The quick brown fox jumps over the lazy dog.",
  "duration": 3.42,
  "segments": [
    {
      "id": 0,
      "text": "The quick brown fox jumps over the lazy dog.",
      "start": 0.0,
      "end": 3.42
    }
  ]
}

Set timestamp_granularities[]=segment alongside response_format=verbose_json. Using timestamp_granularities without verbose_json returns 400.

Segment Fields

Field	Type	Description
`id`	integer	Zero-indexed segment number
`text`	string	Segment transcript
`start`	float	Start time in seconds
`end`	float	End time in seconds
`words`	array	Word-level timestamps (present when the backend provides them — Deepgram only)
`speakers`	array	Speaker labels (present when `diarize=true` in `model_config` — Deepgram only)

Timestamp Availability by Model

Model	`verbose_json` timestamps
`openai/whisper-large-v3-turbo`	No timestamps — backend returns text only
`openai/whisper-tiny`	No timestamps — backend returns text only
`deepgram/nova-3`	Segment-level + word-level (from Deepgram response)

Streaming Response (Undocumented)

Sending Accept: application/stream+json returns newline-delimited JSON chunks as segments are transcribed. Each line:

{"text": "segment text", "start": 0.0, "end": 3.42}

This is used internally but not in the public OAS spec.

​json (Default)

​verbose_json

​Segment Fields

​Timestamp Availability by Model

​Streaming Response (Undocumented)

`json` (Default)

`verbose_json`

Segment Fields

Timestamp Availability by Model

Streaming Response (Undocumented)