Skip to main content

Transcribe speech to text (BETA)

POST 

/ai/audio/transcriptions

Transcribe speech to text. This endpoint is consistent with the OpenAI Transcription API and may be used with the OpenAI JS or Python SDK.

Request

Body

required

    file binaryrequired

    The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 100 MB.

    model stringrequired

    Possible values: [distil-whisper/distil-large-v2]

    ID of the model to use. Only distil-whisper/distil-large-v2 is currently available.

    response_format string

    Possible values: [json, verbose_json]

    Default value: json

    The format of the transcript output. Use verbose_json to take advantage of timestamps.

    timestamp_granularities[] string

    Possible values: [segment]

    The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Currently segment is supported.

Responses

200: Successful Response

Schema

    text stringrequired

    The transcribed text for the audio file.

    duration number

    The duration of the audio file in seconds. This is only included if response_format is set to verbose_json.

    segments

    object[]

    Segments of the transcribed text and their corresponding details. This is only included if response_format is set to verbose_json.

  • Array [

  • id numberrequired

    Unique identifier of the segment.

    start numberrequired

    Start time of the segment in seconds.

    end numberrequired

    End time of the segment in seconds.

    text stringrequired

    Text content of the segment.

  • ]

422: Validation Error

Schema

    detail

    object[]

  • Array [

  • loc

    object[]

    required

  • Array [

  • anyOf

    string

  • ]

  • msg Messagerequired
    type Error Typerequired
  • ]

Loading...