Skip to main content
POST
/
ai
/
audio
/
transcriptions
JavaScript
import Telnyx from 'telnyx';

const client = new Telnyx({
  apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});

const response = await client.ai.audio.transcribe({ model: 'distil-whisper/distil-large-v2' });

console.log(response.text);
{
  "text": "<string>",
  "duration": 123,
  "segments": [
    {
      "id": 123,
      "start": 123,
      "end": 123,
      "text": "<string>"
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
model
enum<string>
default:distil-whisper/distil-large-v2
required

ID of the model to use. distil-whisper/distil-large-v2 is lower latency but English-only. openai/whisper-large-v3-turbo is multi-lingual but slightly higher latency. deepgram/nova-3 supports English variants (en, en-US, en-GB, en-AU, en-NZ, en-IN) and only accepts mp3/wav files.

Available options:
distil-whisper/distil-large-v2,
openai/whisper-large-v3-turbo,
deepgram/nova-3
Example:

"distil-whisper/distil-large-v2"

file
file

The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 100 MB. Cannot be used together with file_url. Note: deepgram/nova-3 only supports mp3 and wav formats.

file_url
string

Link to audio file in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. Support for hosted files is limited to 100MB. Cannot be used together with file. Note: deepgram/nova-3 only supports mp3 and wav formats.

Example:

"https://example.com/file.mp3"

response_format
enum<string>
default:json

The format of the transcript output. Use verbose_json to take advantage of timestamps.

Available options:
json,
verbose_json
Example:

"json"

timestamp_granularities[]
enum<string>

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Currently segment is supported.

Available options:
segment
Example:

"segment"

language
string

The language of the audio to be transcribed. For deepgram/nova-3, only English variants are supported: en, en-US, en-GB, en-AU, en-NZ, en-IN. For openai/whisper-large-v3-turbo, supports multiple languages. distil-whisper/distil-large-v2 does not support language parameter.

Example:

"en-US"

model_config
object

Additional model-specific configuration parameters. Only allowed with deepgram/nova-3 model. Can include Deepgram-specific options such as smart_format, punctuate, diarize, utterance, numerals, and language. If language is provided both as a top-level parameter and in model_config, the top-level parameter takes precedence.

Example:
{ "smart_format": true, "punctuate": true }

Response

Successful Response

text
string
required

The transcribed text for the audio file.

duration
number

The duration of the audio file in seconds. This is only included if response_format is set to verbose_json.

segments
object[]

Segments of the transcribed text and their corresponding details. This is only included if response_format is set to verbose_json.