Skip to main content
Real-time speech-to-text over a persistent WebSocket connection. Send audio, receive transcripts.

Endpoint

wss://api.telnyx.com/v2/speech-to-text/transcription

Connection Lifecycle

1. Handshake

The connection starts as an HTTP GET with Upgrade: websocket. The server responds with 101 Switching Protocols, then the connection upgrades to WebSocket frames.
GET /v2/speech-to-text/transcription?transcription_engine=Deepgram&model=nova-3&input_format=wav HTTP/1.1
Host: api.telnyx.com
Upgrade: websocket
Connection: Upgrade
Authorization: Bearer YOUR_API_KEY
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
You can also connect directly to the WebSocket endpoint without an HTTP upgrade:
wss://transcription.telnyx.com/public/speech-to-text/transcription?transcription_engine=Deepgram&model=nova-3&input_format=wav
The same query parameters apply. Once connected, the message protocol is identical. All configuration is set at connect time via query parameters — engine, model, format, language, options. Cannot be changed mid-session. See Parameters for the full list. Invalid parameters return a JSON error and the connection closes.

2. Streaming

Once connected, audio and transcription flow concurrently — no request/response pairing. Client → Server
Frame typeContent
binaryAudio data — raw bytes, chunked. No base64 or JSON wrapping.
text{"type": "Finalize"} — flush buffer, force final transcript (Deepgram only)
text{"type": "CloseStream"} — flush remaining transcription and close the stream gracefully (Deepgram only)
text{"type": "KeepAlive"} — reset idle timeout (Deepgram only)
Server → Client
MessageDescription
Transcription result{"transcript": "...", "is_final": true, "confidence": 0.98}
Utterance end{"transcript": "", "is_final": true, "utterance_end": true} (Deepgram)
Error{"errors": [...]} — connection closes after
See Messages for the complete wire protocol reference.
Client → Server  binary: audio chunk
Client → Server  binary: audio chunk
Client ← Server  {"transcript":"Hello","is_final":false}
Client → Server  binary: audio chunk
Client ← Server  {"transcript":"Hello, how are you?","is_final":true}

3. Teardown

Send {"type": "CloseStream"} (Deepgram only) to flush remaining audio and close gracefully. The server finishes processing, sends any remaining transcripts, then closes the WebSocket.
Client → Server  {"type":"CloseStream"}
Client ← Server  final transcript
Client ← Server  [connection closed]
For other engines, close the WebSocket connection directly. Dropping the connection without CloseStream works but may lose buffered audio on Deepgram. See Examples for complete code samples.