Endpoint
Connection Lifecycle
1. Handshake
The connection starts as an HTTP GET withUpgrade: websocket. The server responds with 101 Switching Protocols, then the connection upgrades to WebSocket frames.
2. Streaming
Once connected, audio and transcription flow concurrently — no request/response pairing. Client → Server| Frame type | Content |
|---|---|
| binary | Audio data — raw bytes, chunked. No base64 or JSON wrapping. |
| text | {"type": "Finalize"} — flush buffer, force final transcript (Deepgram only) |
| text | {"type": "CloseStream"} — flush remaining transcription and close the stream gracefully (Deepgram only) |
| text | {"type": "KeepAlive"} — reset idle timeout (Deepgram only) |
| Message | Description |
|---|---|
| Transcription result | {"transcript": "...", "is_final": true, "confidence": 0.98} |
| Utterance end | {"transcript": "", "is_final": true, "utterance_end": true} (Deepgram) |
| Error | {"errors": [...]} — connection closes after |
3. Teardown
Send{"type": "CloseStream"} (Deepgram only) to flush remaining audio and close gracefully. The server finishes processing, sends any remaining transcripts, then closes the WebSocket.
CloseStream works but may lose buffered audio on Deepgram.
See Examples for complete code samples.