Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
ID of the model to use. distil-whisper/distil-large-v2 is lower latency but English-only. openai/whisper-large-v3-turbo is multi-lingual but slightly higher latency.
distil-whisper/distil-large-v2, openai/whisper-large-v3-turbo "distil-whisper/distil-large-v2"
The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 100 MB. Cannot be used together with file_url
Link to audio file in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. Support for hosted files is limited to 100MB. Cannot be used together with file
"https://example.com/file.mp3"
The format of the transcript output. Use verbose_json to take advantage of timestamps.
json, verbose_json "json"
The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Currently segment is supported.
segment "segment"
Response
Successful Response
The transcribed text for the audio file.
The duration of the audio file in seconds. This is only included if response_format is set to verbose_json.
Segments of the transcribed text and their corresponding details. This is only included if response_format is set to verbose_json.