Client → Server
Audio Data
Binary WebSocket frames containing raw audio bytes. No base64, no JSON wrapping. Recommended chunk size: 2048–8192 bytes. Smaller chunks reduce latency; larger chunks reduce round trips.Control Messages
JSON text frames with atype field.
| Type | Effect | Engine support |
|---|---|---|
Finalize | Flush audio buffer, force a final transcript | Deepgram only |
CloseStream | End session, close connection gracefully | All |
KeepAlive | Reset idle timeout | Deepgram only |
Server → Client
All server messages are JSON text frames.Transcription Result
Emitted for each recognized speech segment (partial or final).| Field | Type | Present | Description |
|---|---|---|---|
transcript | string | Always | Transcribed text |
is_final | boolean | Always | true = finalized segment. false = interim (may revise). |
speech_final | boolean | Deepgram | true = speaker stopped talking |
confidence | float | When available | 0.0–1.0 confidence score |
utterance_end | boolean | Deepgram | true = silence-triggered utterance boundary |
Utterance End
Emitted on speaker pause (Deepgram). Empty transcript,is_final: true.
Error
Emitted on validation or connection errors. Connection closes shortly after.| Field | Type | Description |
|---|---|---|
errors | array | One or more error objects |
errors[].code | string | Error code (see Errors) |
errors[].title | string | Short description |
errors[].detail | string | Human-readable explanation |
errors[].source.parameter | string | Query parameter that caused the error |
Message Flow
interim_results=false (default) — server sends only final transcripts:
interim_results=true — server sends partials, then final:
is_final: true results are stable.