Stream speech to text over WebSocket

AsyncAPI specification for the Telnyx Speech-to-Text WebSocket endpoint. Real-time speech transcription by streaming audio and receiving transcript frames.

Supported Engines

Azure - Microsoft Azure Speech Services
Deepgram - Deepgram Nova models
Google - Google Cloud Speech-to-Text
Telnyx - Telnyx native transcription (OpenAI Whisper models)
xAI - xAI Grok STT
AssemblyAI - AssemblyAI Universal-Streaming
Speechmatics - Speechmatics real-time transcription
Soniox - Soniox real-time transcription
Parakeet - Self-hosted NVIDIA Parakeet multilingual transcription

Connection Flow

Open WebSocket connection to wss://api.telnyx.com/v2/speech-to-text/transcription with query parameters.
Send binary audio frames (mp3 or wav format).
Receive JSON transcript frames with transcript, is_final, and confidence fields.
Close connection when done.

Authentication

Requires authentication via a Bearer token (Telnyx API v2 key).

WSS

speech-to-text

transcription

Messages

bearerAuth

type:http

Telnyx API v2 Bearer token authentication.

query

type:object

Query parameters passed when opening the WebSocket connection.

Audio Frame

type:string

Client-to-server binary frame containing audio data to transcribe. Audio should be in mp3 or wav format as specified in the input_format query parameter.

Transcript Frame

type:object

Server-to-client frame containing a transcription result. When interim_results is enabled, you may receive multiple interim results (is_final=false) before the final result (is_final=true) for each utterance.

Error Frame

type:object

Server-to-client frame indicating an error during transcription. The connection may be closed shortly after sending this frame.

Speech-to-Text WebSocket streaming pricing

Speech-to-Text REST API overview

⌘I

Messages

WebSocket

REST API

In-Call

For AI Agents

Stream speech to text over WebSocket

Supported Engines

Connection Flow

Authentication