Skip to main content
Telnyx AI Assistants support multiple speech-to-text (STT) models for transcribing caller audio. The model you choose affects transcription accuracy, supported languages, and response latency. You can also fine-tune transcription behavior through settings like eager end-of-turn detection.

Available models

ModelEngineBest for
distil-whisper/distil-large-v2TelnyxDefault — good accuracy, runs on Telnyx infrastructure
openai/whisper-large-v3-turboTelnyxHigh accuracy, larger model, multilingual support
deepgram/fluxDeepgramConversational AI — optimized for turn-taking, English only
deepgram/nova-3DeepgramMultilingual with automatic language detection, slightly higher latency
deepgram/nova-2DeepgramLegacy — stable, cost-effective
azure/fastAzureFast transcription with multilingual support
deepgram/flux is English-only. For multilingual use cases, use deepgram/nova-3, openai/whisper-large-v3-turbo, or azure/fast.

Selecting a model

Portal

In the AI Assistants tab, edit your assistant and navigate to the Voice tab. Select your preferred STT model from the Transcription Model dropdown. AI Assistant Transcription Model Selection

API

Set the transcription.model field when creating or updating an assistant:
curl -X POST https://api.telnyx.com/v2/ai/assistants \
  -H "Authorization: Bearer $TELNYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Assistant",
    "model": "anthropic/claude-haiku-4-5",
    "instructions": "You are a helpful voice assistant.",
    "transcription": {
      "model": "deepgram/flux"
    }
  }'
You can also set the transcription language explicitly. If omitted or set to auto, the model auto-detects the language:
"transcription": {
  "model": "deepgram/nova-3",
  "language": "es"
}

Eager end-of-turn (Deepgram Flux)

Eager end-of-turn detection is a latency optimization available exclusively with deepgram/flux. It starts large language model (LLM) processing before the caller fully stops speaking, reducing perceived response time by ~150 ms on average. Eager end-of-turn is enabled by default when using deepgram/flux. The eager_eot_threshold defaults to 0.3, meaning the system aggressively begins speculative processing at the earliest sign of a turn ending.

How it works

  1. EagerEndOfTurn — The transcription model detects a likely pause. The system begins generating an LLM response speculatively.
  2. TurnResumed — If the caller continues speaking, the speculative response is discarded and transcription continues.
  3. EndOfTurn — When the caller definitively finishes, the final (or already-prepared) response plays back.
Because the LLM starts processing early, the response is often ready the moment the caller stops — eliminating the wait that normally occurs between end-of-speech and response playback.

Parameters

These settings are available under transcription.settings and apply only to deepgram/flux:
FieldTypeRangeDefaultDescription
eager_eot_thresholdnumber0.3–0.90.3Confidence level to start speculative LLM processing. Lower values trigger earlier.
eot_thresholdnumberConfidence level for final end-of-turn confirmation. Must be greater than or equal to eager_eot_threshold.
eot_timeout_msintegerMaximum silence duration (ms) before forcing an end-of-turn.
The following settings apply to all Deepgram models:
FieldTypeDefaultDescription
smart_formatbooleanEnables automatic punctuation, casing, and formatting.
numeralsbooleanConverts spoken numbers to digits (e.g., “five hundred” → “500”).

Trade-offs

Lower eager_eot_threshold (e.g., 0.3)
  • Faster perceived responses — the LLM starts earlier
  • More speculative LLM calls (50–70% increase) — some will be discarded when the caller continues speaking
  • Best for: latency-sensitive applications like customer service bots
Higher eager_eot_threshold (e.g., 0.7)
  • Fewer wasted LLM calls — triggers only on high-confidence pauses
  • Slightly higher response latency
  • Best for: cost-sensitive applications or conversations with frequent mid-sentence pauses
Disabling eager end-of-turn: Set eager_eot_threshold equal to eot_threshold. When both thresholds match, the system waits for full end-of-turn confirmation before starting LLM processing.

Tuning recommendations

Use the defaults — eager_eot_threshold: 0.3 provides the fastest response times. Accept the higher LLM call volume as a trade-off for responsiveness.
"transcription": {
  "model": "deepgram/flux",
  "settings": {
    "eager_eot_threshold": 0.3
  }
}
Raise the eager threshold slightly to reduce speculative calls while still getting a latency benefit.
"transcription": {
  "model": "deepgram/flux",
  "settings": {
    "eager_eot_threshold": 0.5
  }
}
Disable eager end-of-turn by matching both thresholds. This avoids any speculative processing.
"transcription": {
  "model": "deepgram/flux",
  "settings": {
    "eager_eot_threshold": 0.7,
    "eot_threshold": 0.7
  }
}

Configure via API

Create an assistant with tuned transcription settings:
curl -X POST https://api.telnyx.com/v2/ai/assistants \
  -H "Authorization: Bearer $TELNYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Low Latency Assistant",
    "model": "anthropic/claude-haiku-4-5",
    "instructions": "You are a helpful voice assistant.",
    "transcription": {
      "model": "deepgram/flux",
      "language": "en",
      "settings": {
        "eager_eot_threshold": 0.3,
        "eot_threshold": 0.7,
        "eot_timeout_ms": 5000,
        "smart_format": true,
        "numerals": true
      }
    }
  }'