Skip to main content
Deepgram, xAI, Google, Speechmatics, and Soniox. Other engines ignore this parameter.
Controls how long the engine waits after silence before finalizing an utterance.
Soniox has a different valid range. When transcription_engine=Soniox, this parameter maps to max_endpoint_delay_ms and must be between 500 and 3000 ms. Values outside that range are rejected. The default (100 ms) and the low-value examples below apply to Deepgram, xAI, Google, and Speechmatics only.
# Deepgram / xAI / Google / Speechmatics
wss://api.telnyx.com/v2/speech-to-text/transcription?endpointing=300

# Soniox (500–3000 ms)
wss://api.telnyx.com/v2/speech-to-text/transcription?transcription_engine=Soniox&endpointing=1000
Default: 100 ms (not applicable to Soniox — Soniox endpointing is disabled unless a value in the 500–3000 ms range is provided).

Values

ValueBehavior
Integer (ms)Finalize after this many ms of silence. Lower = faster but more splits.
"false"Disable endpointing entirely. No automatic utterance boundaries.

Trade-offs

Low values (50–100 ms) — Fast response. Utterances may split mid-sentence on short pauses. (Deepgram, xAI, Google, Speechmatics only — below Soniox minimum.) High values (300–1000 ms) — More complete sentences. Higher latency before finalization. Soniox range (500–3000 ms) — Minimum 500 ms. Use 500–800 ms for responsive turn detection, 1000–3000 ms for longer utterances with natural pauses. Disabled ("false") — No automatic splits. Use Finalize control messages to manually trigger boundaries, or rely on CloseStream for a single final transcript.

Interaction With Utterance End

When endpointing triggers, Deepgram sends the final transcript followed by an utterance end event (if utterance_end_ms is configured server-side — currently 1000 ms).
{"transcript": "Hello, how are you?", "is_final": true}
{"transcript": "", "is_final": true, "utterance_end": true}
The utterance end marker signals “this speaker turn is done.” See Messages for details.