Check out our upcoming events and meetups! View events โ
Transcribe speech to text. This endpoint is consistent with the OpenAI Transcription API and may be used with the OpenAI JS or Python SDK.
import fs from 'fs';
import Telnyx from 'telnyx';
const client = new Telnyx({
apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});
const response = await client.ai.audio.transcribe({ model: 'distil-whisper/distil-large-v2' });
console.log(response.text);{
"text": "<string>",
"duration": 123,
"segments": [
{
"id": 123,
"start": 123,
"end": 123,
"text": "<string>"
}
],
"words": [
{
"word": "<string>",
"start": 123,
"end": 123,
"confidence": 123,
"speaker": 123
}
]
}Documentation Index
Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
Use this file to discover all available pages before exploring further.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
ID of the model to use. distil-whisper/distil-large-v2 is lower latency but English-only. openai/whisper-large-v3-turbo is multi-lingual but slightly higher latency. deepgram/nova-3 supports English variants (en, en-US, en-GB, en-AU, en-NZ, en-IN) and only accepts mp3/wav files.
distil-whisper/distil-large-v2, openai/whisper-large-v3-turbo, deepgram/nova-3 "distil-whisper/distil-large-v2"
The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. File uploads are limited to 100 MB. Cannot be used together with file_url. Note: deepgram/nova-3 only supports mp3 and wav formats.
Link to audio file in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. Support for hosted files is limited to 100MB. Cannot be used together with file. Note: deepgram/nova-3 only supports mp3 and wav formats.
"https://example.com/file.mp3"
The format of the transcript output. Use verbose_json to take advantage of timestamps.
json, verbose_json "json"
The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Currently segment is supported.
segment "segment"
The language of the audio to be transcribed. For deepgram/nova-3, only English variants are supported: en, en-US, en-GB, en-AU, en-NZ, en-IN. For openai/whisper-large-v3-turbo, supports multiple languages. distil-whisper/distil-large-v2 does not support language parameter.
"en-US"
Additional model-specific configuration parameters. Only allowed with deepgram/nova-3 model. Can include Deepgram-specific options such as smart_format, punctuate, diarize, utterance, numerals, and language. If language is provided both as a top-level parameter and in model_config, the top-level parameter takes precedence.
{ "smart_format": true, "punctuate": true }Successful Response
Response fields vary by model. distil-whisper/distil-large-v2 returns text, duration, and segments in verbose_json mode. openai/whisper-large-v3-turbo returns text only. deepgram/nova-3 returns text and, depending on model_config, may include words with per-word timestamps and speaker labels.
The transcribed text for the audio file.
The duration of the audio file in seconds. Returned by distil-whisper/distil-large-v2 and deepgram/nova-3 when response_format is verbose_json. Not returned by openai/whisper-large-v3-turbo.
Segments of the transcribed text and their corresponding details. Returned by distil-whisper/distil-large-v2 when response_format is verbose_json. Not returned by openai/whisper-large-v3-turbo.
Show child attributes
Word-level timestamps and optional speaker labels. Only returned by deepgram/nova-3 when word-level output is enabled via model_config.
Show child attributes
Was this page helpful?
import fs from 'fs';
import Telnyx from 'telnyx';
const client = new Telnyx({
apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});
const response = await client.ai.audio.transcribe({ model: 'distil-whisper/distil-large-v2' });
console.log(response.text);{
"text": "<string>",
"duration": 123,
"segments": [
{
"id": 123,
"start": 123,
"end": 123,
"text": "<string>"
}
],
"words": [
{
"word": "<string>",
"start": 123,
"end": 123,
"confidence": 123,
"speaker": 123
}
]
}