Available models
| Model | Engine | Best for |
|---|---|---|
distil-whisper/distil-large-v2 | Telnyx | Default — good accuracy, runs on Telnyx infrastructure |
openai/whisper-large-v3-turbo | Telnyx | High accuracy, larger model, multilingual support |
deepgram/flux | Deepgram | Conversational AI — optimized for turn-taking, English only |
deepgram/nova-3 | Deepgram | Multilingual with automatic language detection, slightly higher latency |
deepgram/nova-2 | Deepgram | Legacy — stable, cost-effective |
azure/fast | Azure | Fast transcription with multilingual support |
deepgram/flux is English-only. For multilingual use cases, use deepgram/nova-3, openai/whisper-large-v3-turbo, or azure/fast.Selecting a model
Portal
In the AI Assistants tab, edit your assistant and navigate to the Voice tab. Select your preferred STT model from the Transcription Model dropdown.
API
Set thetranscription.model field when creating or updating an assistant:
auto, the model auto-detects the language:
Eager end-of-turn (Deepgram Flux)
Eager end-of-turn detection is a latency optimization available exclusively withdeepgram/flux. It starts large language model (LLM) processing before the caller fully stops speaking, reducing perceived response time by ~150 ms on average.
Eager end-of-turn is enabled by default when using deepgram/flux. The eager_eot_threshold defaults to 0.3, meaning the system aggressively begins speculative processing at the earliest sign of a turn ending.
How it works
- EagerEndOfTurn — The transcription model detects a likely pause. The system begins generating an LLM response speculatively.
- TurnResumed — If the caller continues speaking, the speculative response is discarded and transcription continues.
- EndOfTurn — When the caller definitively finishes, the final (or already-prepared) response plays back.
Parameters
These settings are available undertranscription.settings and apply only to deepgram/flux:
| Field | Type | Range | Default | Description |
|---|---|---|---|---|
eager_eot_threshold | number | 0.3–0.9 | 0.3 | Confidence level to start speculative LLM processing. Lower values trigger earlier. |
eot_threshold | number | — | — | Confidence level for final end-of-turn confirmation. Must be greater than or equal to eager_eot_threshold. |
eot_timeout_ms | integer | — | — | Maximum silence duration (ms) before forcing an end-of-turn. |
| Field | Type | Default | Description |
|---|---|---|---|
smart_format | boolean | — | Enables automatic punctuation, casing, and formatting. |
numerals | boolean | — | Converts spoken numbers to digits (e.g., “five hundred” → “500”). |
Trade-offs
Lowereager_eot_threshold (e.g., 0.3)
- Faster perceived responses — the LLM starts earlier
- More speculative LLM calls (50–70% increase) — some will be discarded when the caller continues speaking
- Best for: latency-sensitive applications like customer service bots
eager_eot_threshold (e.g., 0.7)
- Fewer wasted LLM calls — triggers only on high-confidence pauses
- Slightly higher response latency
- Best for: cost-sensitive applications or conversations with frequent mid-sentence pauses
eager_eot_threshold equal to eot_threshold. When both thresholds match, the system waits for full end-of-turn confirmation before starting LLM processing.
Tuning recommendations
Latency-sensitive (customer service, IVR)
Latency-sensitive (customer service, IVR)
Use the defaults —
eager_eot_threshold: 0.3 provides the fastest response times. Accept the higher LLM call volume as a trade-off for responsiveness.Balanced (general-purpose assistants)
Balanced (general-purpose assistants)
Raise the eager threshold slightly to reduce speculative calls while still getting a latency benefit.
Accuracy-sensitive (dictation, long-form input)
Accuracy-sensitive (dictation, long-form input)
Disable eager end-of-turn by matching both thresholds. This avoids any speculative processing.