Skip to main content
GET
/
speech-to-text
JavaScript
import Telnyx from 'telnyx';

const client = new Telnyx({
  apiKey: process.env['TELNYX_API_KEY'], // This is the default and can be omitted
});

await client.speechToText.transcribe({ input_format: 'mp3', transcription_engine: 'Azure' });
{
  "type": "<string>",
  "transcript": "<string>",
  "is_final": true,
  "confidence": 123
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

transcription_engine
enum<string>
required

The transcription engine to use for processing the audio stream.

Available options:
Azure,
Deepgram,
Google,
Telnyx
input_format
enum<string>
required

The format of input audio stream.

Available options:
mp3,
wav
language
string

The language spoken in the audio stream.

interim_results
boolean

Whether to receive interim transcription results.

model

The specific model to use within the selected transcription engine.

Available options:
fast
endpointing
integer

Silence duration (in milliseconds) that triggers end-of-speech detection. When set, the engine uses this value to determine when a speaker has stopped talking. Not all engines support this parameter.

redact
string

Enable redaction of sensitive information (e.g., PCI data, SSN) from transcription results. Supported values depend on the transcription engine.

keyterm
string

A key term to boost in the transcription. The engine will be more likely to recognize this term. Can be specified multiple times for multiple terms.

keywords
string

Comma-separated list of keywords to boost in the transcription. The engine will prioritize recognition of these words.

Body

application/octet-stream

Client sends binary audio frames (mp3 or wav format) over the WebSocket. See SttClientEvent schema.

Binary audio data in mp3 or wav format.

Response

WebSocket connection established. Communication proceeds via binary audio frames (client) and JSON transcript frames (server).

Client → Server: Binary audio data (mp3/wav). Server → Client: See TranscriptFrame and SttErrorFrame schemas.

Union of all server-to-client WebSocket events for STT streaming.

type
string
required

Frame type identifier.

Allowed value: "transcript"
transcript
string
required

The transcribed text from the audio.

is_final
boolean

Whether this is a final transcription result. When false, this is an interim result that may be refined.

confidence
number

Confidence score of the transcription, ranging from 0 to 1.