Skip to main content
GET
/
speech-to-text
JavaScript
import Telnyx from 'telnyx';

const client = new Telnyx({
  apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});

await client.speechToText.transcribe({ input_format: 'mp3', transcription_engine: 'Azure' });
{
  "type": "<string>",
  "transcript": "<string>",
  "is_final": true,
  "confidence": 123
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

transcription_engine
enum<string>
required

The transcription engine to use for processing the audio stream.

Available options:
Azure,
Deepgram,
Google,
Telnyx
input_format
enum<string>
required

The format of input audio stream.

Available options:
mp3,
wav
language
string

The language spoken in the audio stream.

interim_results
boolean

Whether to receive interim transcription results.

model

The specific model to use within the selected transcription engine.

Available options:
fast

Body

application/octet-stream

Client sends binary audio frames (mp3 or wav format) over the WebSocket. See SttClientEvent schema.

Binary audio data in mp3 or wav format.

Response

WebSocket connection established. Communication proceeds via binary audio frames (client) and JSON transcript frames (server).

Client → Server: Binary audio data (mp3/wav). Server → Client: See TranscriptFrame and SttErrorFrame schemas.

Union of all server-to-client WebSocket events for STT streaming.

type
string
required

Frame type identifier.

Allowed value: "transcript"
transcript
string
required

The transcribed text from the audio.

is_final
boolean

Whether this is a final transcription result. When false, this is an interim result that may be refined.

confidence
number

Confidence score of the transcription, ranging from 0 to 1.