Deepgram, xAI, Google, Speechmatics, and Soniox. Other engines ignore this parameter.
100 ms (not applicable to Soniox — Soniox endpointing is disabled unless a value in the 500–3000 ms range is provided).
Values
| Value | Behavior |
|---|---|
| Integer (ms) | Finalize after this many ms of silence. Lower = faster but more splits. |
"false" | Disable endpointing entirely. No automatic utterance boundaries. |
Trade-offs
Low values (50–100 ms) — Fast response. Utterances may split mid-sentence on short pauses. (Deepgram, xAI, Google, Speechmatics only — below Soniox minimum.) High values (300–1000 ms) — More complete sentences. Higher latency before finalization. Soniox range (500–3000 ms) — Minimum 500 ms. Use 500–800 ms for responsive turn detection, 1000–3000 ms for longer utterances with natural pauses. Disabled ("false") — No automatic splits. Use Finalize control messages to manually trigger boundaries, or rely on CloseStream for a single final transcript.
Interaction With Utterance End
When endpointing triggers, Deepgram sends the final transcript followed by an utterance end event (ifutterance_end_ms is configured server-side — currently 1000 ms).