Telnyx

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

transcription_engine

enum<string>

required

The transcription engine to use for processing the audio stream.

Available options:

Azure,

Deepgram,

Google,

Telnyx

input_format

enum<string>

required

The format of input audio stream.

Available options:

mp3,

wav

language

string

The language spoken in the audio stream.

interim_results

boolean

Whether to receive interim transcription results.

model

The specific model to use within the selected transcription engine.

Available options:

fast

endpointing

integer

Silence duration (in milliseconds) that triggers end-of-speech detection. When set, the engine uses this value to determine when a speaker has stopped talking. Not all engines support this parameter.

redact

string

Enable redaction of sensitive information (e.g., PCI data, SSN) from transcription results. Supported values depend on the transcription engine.

keyterm

string

A key term to boost in the transcription. The engine will be more likely to recognize this term. Can be specified multiple times for multiple terms.

keywords

string

Comma-separated list of keywords to boost in the transcription. The engine will prioritize recognition of these words.

Body

application/octet-stream

Client sends binary audio frames (mp3 or wav format) over the WebSocket. See SttClientEvent schema.

Binary audio data in mp3 or wav format.

Response

WebSocket connection established. Communication proceeds via binary audio frames (client) and JSON transcript frames (server).

Client → Server: Binary audio data (mp3/wav). Server → Client: See TranscriptFrame and SttErrorFrame schemas.

Option 1
Option 2

Union of all server-to-client WebSocket events for STT streaming.

type

string

required

Frame type identifier.

Allowed value: "transcript"

transcript

string

required

The transcribed text from the audio.

is_final

boolean

Whether this is a final transcription result. When false, this is an interim result that may be refined.

confidence

number

Confidence score of the transcription, ranging from 0 to 1.

WebSocket

REST API

In-Call

Speech to text over WebSocket

Authorizations

Query Parameters

Body

Response